SlideShare uma empresa Scribd logo
1 de 14
Baixar para ler offline
Information Access Solutions for
Media and Publishing
Endeca Business White Paper




ENDECA
55 Cambridge Parkway Cambridge, MA. 02142
Telephone 617.577.7999
Information Access Solutions for Media and Publishing
Endeca Business White Paper



TABLE OF CONTENTS




1. INFORMATION ACCESS AND RETRIEVAL CHALLENGES                                                                                                                                                        3

    1.1. Introduction
    1.2. The Negative Business Impact
       1.2.1. Poor customer acquisition and retention
       1.2.2. Lost revenues
       1.2.3. High technology costs, high content management costs
    1.3. Technology Obstacles to Traditional Information Access and Retrieval
       1.3.1. Search behavior: human-centered design
       1.3.2. Why traditional search technologies often fail
       1.3.3. Managing complex content


2. THE ENDECA PLATFORM – MAXIMIZING INFORMATION ACCESS AND RETRIEVAL OF ALL KINDS OF DATA                                                                                                             6

    2.1. Guided Navigation – A Breakthrough Technology
    2.1.1. Faceted navigation overcomes the limits of taxonomy solutions
       2.1.2. An intuitive, easy-to-use interface
       2.1.3. The power to search both structured and unstructured data
    2.2. Advanced Search Features
       2.2.1. Integrated search and Guided Navigation
       2.2.2. Sharp answers to fuzzy questions
       2.2.3. Adding structure to unstructured content
       2.2.4. Targeted searching
    2.3. Content Spotlighting
    2.4. Additional Platform Features
       2.4.1. Single interface to multiple data sources
       2.4.2. Open architecture
       2.4.3. A high-performing, low-cost infrastructure


3. ENDECA ROI                                                                                                                                                                                        12

    3.1. Improved Customer Retention and Acquisition
    3.2. Increased Revenues
       3.2.1. Transaction revenues.
       3.2.2. Advertising revenues.
       3.3.3. Subscription and registration revenues
       3.3.4. Licensing revenues.
    3.3. Lower Total Cost of Ownership


4. CONCLUSION                                                                                                                                                                                        14


5. FOOTNOTES                                                                                                                                                                                         14




© 2005 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
1. INFORMATION ACCESS AND RETRIEVAL CHALLENGES

1.1. Introduction

Information providers of all types, including directories, news and magazine publishers, and multimedia content suppli-
ers, continue to make substantial investments in their online business. This market is still growing quickly, as consumers
spend a larger percentage of their time online, and revenue dollars follow them. For example, online ad spending grew
28.6% in the second quarter of 2005 while newspaper print ads grew only 1.9% in the same period.1

To take best advantage of this opportunity, traditional media and publishing companies have diversified into online delivery
– investing heavily in popular web technologies and IT resources. But this is only half the battle. The fight for market and
wallet share on the web is equally fierce. New, free online media – web search engines, portals, and blogs – are gain-
ing momentum in the traditional media space. In 2004, 37% of households used the free web as their only information
source.2 This free content puts downward pressure on margins – and turns content into a commodity.

To win online, media and publishing companies must differentiate themselves by offering not only premium content, but
also a better user experience. But getting content online – and, more important, making it easily accessible – isn’t simple.
Backlogs of proprietary information are often huge, and content and data sources are proliferating almost exponentially.

Even once this information is online, it usually isn’t easy to find. Users input a query and often get a million results, or
worse -- get no results. Then they have no meaningful way to browse further to find what they are looking for, aside from
taking another shot in the dark. This is the same frustrating search experience they have on the free web.

But for companies in the business of providing information online – which have to compete with the free web – successful
information access is paramount. Poor search has a serious negative business impact on customer retention and acquisi-
tion and, consequently, revenues. Information access failures can affect short-term and long-term profitability.

What can companies do about this problem, and why should they do it, considering their already significant IT invest-
ments? Here’s a closer look at how these search failures negatively affect the business, why they occur, and what new
solution can improve search success to help differentiate a site and provide competitive advantage.

1.2. The Negative Business Impact

Over time, information access and retrieval difficulties produce negative business consequences in three areas – customer
acquisition and retention, revenue, and profitability.

1.2.1. Poor customer acquisition and retention

With an increasing array of information sources available—including commodity search engines like Google and Yahoo and
content aggregators such as wikis and blogs– premium content providers must first attract customers away from these
free resources, and then ensure that they stay on the site. Even if there is unique content on the site, users will quickly
abandon it if they aren’t able to find that content. Customers will stay on the site and continue to come back to that site if
they believe that there is valuable content available and they have an easy way to access it.

These search failures can have a short- and long-term impact on the success of the business. In the short-term, search
failures diminish customer satisfaction and loyalty and, if constantly repeated, result in failed relationships and lost cus-
tomers. What’s more, if enough customers defect because of poor search and retrieval, negative word-of-mouth spreads.
The ability to attract new customers is hampered, brand equity is destroyed, and market share begins to deteriorate. The
company is suddenly at a significant competitive disadvantage.

In short, just providing users with premium content isn’t enough to get a competitive edge if that content is not easily ac-
cessible. A superior user experience – one that is better than the hit-or-miss searches of free Internet resources – is a
necessary differentiator to encourage customer loyalty and site usage.

1.2.2. Lost revenues

Information access failures also lead to lost revenues in several different areas, including subscription and licensing rev-
enue, transaction or sales revenue, and advertising revenue.



                                                                                                                                                                                               3

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
Online publishers that rely on subscription fees as a primary revenue source need an effective information access solution
in order to acquire and retain subscribers. If users can’t easily find relevant content, they won’t see the value in paying for
a subscription and/or they won’t see the need in renewing a current subscription. As a result, overall revenues from the
subscription service will decline. Multimedia content providers that depend on licensing revenue will also see a decline
in business if their customers fail to renew their license agreements because they are unable to find the content that they
are seeking. Competitors with better search experiences will steal these users.

A primary source of revenue for some media and publishing companies may be from sales transactions – commonly re-
ferred to as “pay-per-piece” transactions. For example, in addition to their subscription-based services, market research
firms sell their reports individually, and some online publishers sell certain articles individually. Multimedia content
providers may sell photos and other graphic images, audio files, and video files on a per-piece basis. Each failed search is
a lost transaction.

Poor search also has a subtler, but just as significant, negative impact on advertising revenues. Over the past few years,
the focus for directories and information publishers has primarily been on getting their data from print to the online pub-
lication. As a result, the search experience is poor, and users are not getting the value that they expect from these sites.
Now add to this content access problem the fact that advertisers spend money on sites that have high page views and give
them the ability to target ads to relevant customers. The result is twofold: low traffic, as users quickly abandon the site to
look for an alternative source to find the information, and also low advertising revenue, as advertisers abandon the site
because their click-through and conversion rates are low.

To lure advertisers to their site, companies need not just a certain volume of traffic, but high-quality traffic, which can
typically be measured by the number of unique visitors to the site and the number of page views per unique visitor or per
session. These traffic figures speak directly to the volume and quality of the traffic, and, consequently, affect the advertis-
ing revenues generated by the site. Publishers charge advertisers on a CPM (cost per thousand impressions) or CPC (cost
per click) basis. If an advertiser believes that it is going to receive high-quality traffic (and a large volume of it), it will be
willing to pay a higher CPM or CPC.

Site traffic also affects the amount of the ad inventory. The fewer the page views, the less ad inventory there will be to
sell to potential advertisers. In the worst-case scenario, advertisers fail to patronize the site at all because of the poten-
tially poor traffic metrics. Consequently, and once again, search failures can have a negative impact on another important
revenue stream for media and publishing companies – advertising revenue. A high-quality user experience, especially one
that is providing easy access to sought-after content, can and will determine the amount of ad revenues generated in the
short- and long-term.

1.2.3. High technology costs, high content management costs

Most media and publishing companies have already made a large investment in hardware, software, and IT talent in order
to implement their site and get their premium content online. Adding search capabilities often requires expensive, multi-
ple servers to handle the volume of traffic and the added complexity of searching large volumes of data. What’s more, with
most search technologies, maintaining and updating the content once it’s online can be equally expensive, especially if the
site requires the creation and maintenance of hard-coded taxonomies. Other search engines may also require expensive
hardware in order to add additional content or update the existing content because of the complexity of the software appli-
cation. To ensure a high quality user experience, business user tools are incredibly important but are often non-existent,
expensive, and/or difficult to use. When the number of users and the amount of revenues stagnate or decline, all of these
costs together can result in a high total cost of ownership and, consequently, lower total profitability.

These negative impacts are interrelated. Their common underlying problems lie in the inherent limitations of commonly
used search technologies, especially for searching complex information collections involving multiple data sources that
include both structured and unstructured data types.

1.3. Technology Obstacles to Traditional Information Access and Retrieval

Many web design packages and other related software, such as content management systems, come equipped with
search capabilities. Companies often choose to either leverage these existing systems for search or buy other traditional
search technologies. These companies quickly discover that this traditional search functionality is limited in its ability to
actually find the relevant information. This is particularly true if the data has both structured and unstructured charac-
teristics, is large in volume and complexity, and resides in disparate repositories. A closer look at human search behavior
and traditional search technologies shows why these limitations exist.


                                                                                                                                                                                               4

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
1.3.1. Search behavior: human-centered design

When searchers are having a hard time finding the right content, it’s not for lack of ingenuity; it’s for lack of the right tools
to match their inventiveness and flexibility. While many search software companies have upgraded their technology to
improve existing tools, the right approach lies in first studying human searching behavior to figure out which tools are the
right tools to build. While such user-centered design is considered a best practice in other endeavors, it has been over-
looked in the critical field of information access and retrieval – until now.

Research conducted by top information scientists and user experience experts supports the notion that people looking for
information follow consistent behaviors over a wide range of tasks. They follow a particular pattern of behavior as they
initiate a task, which changes as they proceed through the task, and then as they either finish or abandon a task. In order
to continue to make progress, they need a different set of tools at each step of the process. The appropriate approach,
missed by traditional search technology, arms users with all the tools they need to find what they want. When users finally
have the right tools, the tools themselves feel intuitive and transparent, creating a superior user experience and resulting
in customer satisfaction and loyalty. In short, the path to business objectives lies in supporting user goals.3

1.3.2. Why traditional search technologies often fail

Traditional search technologies each have their inherent limits:

        • Keyword search: The effectiveness of keyword (or full-text) search relies on users to predict what might constitute
          a good query, yet paradoxically, they don’t yet know enough about the content to know what to ask. Prediction fails
          because wrong guesses yield the extremes of “no results,” or too many answers, especially in response to broad
          keywords – and no helpful guidance on why their prediction failed. For example, searching on a generic term like
          “sports” in a large collection of news articles can generate thousands of results. Furthermore, relevance algo-
          rithms fail to put the most useful results at the top of long lists. If users don’t know precisely what to ask for as they
          initiate their search task and there is no effective way to narrow the list of results, users will quickly abandon the
          site and look for the information elsewhere.

        • Navigating taxonomies or fixed classifications: For information seekers to navigate fixed hierarchical taxonomies,
          they have to make predictions about where to find the content they want. Are all the possible articles about the war
          in Iraq under the “International” branch in “News”? Certainly not. There may also be articles about the war in Iraq
          under “Terrorism” in “Politics” branch. In other words, some information may be hidden because customers don’t
          know the “right” branch of the hierarchy to select. They need to choose the path that will lead them to the right
          content, which means they have to make the right decisions about which branch to choose at each decision point.
          If the search isn’t productive, there’s no way to know why, and there isn’t any way to adapt or iterate their behavior
          in order to make progress. Moreover, fixed taxonomies are expensive to maintain because successful search paths
          are hard coded and need to be changed as the data changes.

The limitations of these different technologies are especially apparent in searching the complex content offered on these
sites.

1.3.3. Managing complex content

Media and publishing companies have several different types of data in various repositories and collections, each with its
own access and retrieval challenges:

        • Directories primarily have highly structured data, typically located in databases that are frequently updated. If the
          directory combines data from several databases, there may be different metadata or taxonomies for each database,
          especially in cases where the data comes from external sources (for example, aggregating a number of regional
          Yellow Pages print directories). In addition, a directory site may also offer some unstructured data. For example, a
          job site might have multiple data repositories – for job postings, company profiles, and lists of employers – but also
          a collection of unstructured data in the form of resumes. Combining structured and unstructured data and combin-
          ing data from different repositories have been particularly challenging for traditional search technologies. This is
          due to the fact that most technologies were built on the assumption that all data would be in the same format and
          would be located in the same repository.

        • Online publishers primarily have unstructured data, i.e., long-form text documents. Looking through thousands of
          results – each with pages and pages of text—for the right piece of information is a tremendous challenge for users.


                                                                                                                                                                                               5

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
That’s why it’s critical for companies to supply readers with an intuitive experience that allows them to easily and
           quickly identify which document is the most relevant. Traditional search technologies do not have the capabilities to
           extract structure from unstructured data. Yet this structure is necessary to provide users with the context they need
           to make these refinement decisions. Some search technologies offer rigid taxonomies or categorization schemes,
           but that won’t suffice either for the reasons discussed above. In addition, most online publishers use sophisticated
           content management systems to store, tag, and publish the data. Consequently, it is essential that the search en-
           gine has the capability to extract data from these systems in order to allow for fast and flexible indexing.

        • Multimedia producers and suppliers have a completely unique type of data in the form of images, audio, and video
          – requiring flexible and powerful indexing and search capabilities. In these instances, it is even more important for
          the search technology to leverage the metadata associated with the content because that’s where most of the con-
          text for searching resides. This information may be held in a digital asset management system – requiring adaptors
          to extract it for data collection.

If the common search technologies discussed above are employed in cases where complex data exists, information re-
trieval challenges will persist– and information seekers will avoid using the applications whenever possible.

The good news is that there is now a breakthrough solution designed to overcome the “million or none” results impasse
of traditional search technology and to access and retrieve all kinds of data across diverse systems. All of this is possible
without encountering high implementation and maintenance costs that are associated with many site implementations.


2. THE ENDECA PLATFORM – MAXIMIZING INFORMATION ACCESS AND RETRIEVAL OF ALL KINDS OF DATA

Endeca provides solutions and best practices designed specifically for each kind of information provider—directories,
online publishers, and multimedia suppliers. Underlying these three solutions is a common technology core: the Endeca
Information Access Platform, which includes the Endeca Navigation Engine. This core technology overcomes the limita-
tions of traditional search engines and addresses the data challenges facing media and publishing companies.

Built on the Endeca Information Access Platform, Endeca solutions for directories, online publishers, and multimedia
content providers offer a single, fast, easy, and effective way to search and browse large volumes of data in structured and
unstructured formats – across all types of systems. Endeca solutions integrate search and navigation, providing the flex-
ibility and control needed to allow users to search intuitively and effectively. These solutions also return the results of all
searches in a precise navigation context that improves users’ future predictive search choices, give them relevant tools to
adapt their search at each stage, and encourage meaningful search iteration and revision.

At each stage of the search process, customers progress toward their goal, which means that they are staying on the site
longer and returning to the site more often, resulting in an increase in site activity (for example, page views, click-through
rates, session duration, etc.). And because they ultimately find what they want, they are satisfied and become long-term,
loyal customers. The Endeca technology platform also offers a low total cost of ownership and is designed for ease of
installation and use.

Based on nine pending patents, the Endeca Information Access Platform includes the following features and capabilities,
which make these customer and financial benefits possible.

2.1. Guided Navigation – A Breakthrough Technology

The Endeca Information Access Platform includes the Endeca Navigation Engine, which executes innovative browsing
technology called “Guided Navigation.” This helps users refine and explore relevant results to overcome the “million or
none” obstacle, so they can quickly and easily find what they are looking for and even discover information they didn’t
know existed. Specifically, Guided Navigation provides:

2.1.1. Faceted navigation overcomes the limits of taxonomy solutions

In general, navigation helps users who are not familiar with data to ask smarter questions by exposing all the choices that
are available to them. But Guided Navigation goes far beyond current browse solutions by making a new kind of navigation
possible. Based on faceted navigation, a multi-dimensional approach advocated by information scientists as a far more
efficient and easy-to-use way to find information than taxonomies, Guided Navigation:



                                                                                                                                                                                               6

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
• Creates hundreds of valid browse paths to each record, rather than just the few paths available in a taxonomy, tre-
          mendously increasing the likelihood that a user will find a record

        • Allows users to prioritize their choices in their own personalized way rather than forcing users down the arbitrary
          path of the taxonomist

        • Updates all navigation options at each click, showing users all the valid questions they can ask next and eliminating
          millions of possible deadened paths.

        • Integrates fully with search, making it possible to refine long lists of search results, and search navigation options.
          (See Section 2.2.1 below for additional details about integrated search and Guided Navigation)

2.1.2. An intuitive, easy-to-use interface

Simply calculating which questions users can ask next is not enough to facilitate search success because there are usually
thousands of choices. In fact, the best way to organize navigation options changes markedly as users narrow from a vast




                           Guardian Unlimited has seen significant increase in search activity on the site as readers use Guided Navigation to browse and refine their
                           search results.



                                                                                                                                                                                               7

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
space, down close to a result. Adapting to the changing situation, Guided Navigation intelligently reorganizes those options
with each click in the most meaningful, relevant way – and presents those new choices in a clear on-screen list that shows
users the next step.

The result is a more effective interface, strongly preferred by end-users over traditional solutions, that provides easy ac-
cess to the power and flexibility of Guided Navigation. Users see progress as they search and discover related information
they didn’t know existed, so they remain on the site, exploring the data and finding relevant content. Because their search
experience is meaningful and successful, they are satisfied and consistently return to the site.

2.1.3. The power to search both structured and unstructured data

Although media and publishing companies have different data profiles, Endeca’s technology was built with inherent capa-
bilities to meet their different needs. Endeca can handle a wide range of data formats: from unstructured documents with
basic metadata or fielded information; to semi-structured customer data, product information, XML pages, and auto-clas-
sified documents; to highly structured parametric data and databases. In fact, the Endeca Information Access Platform
allows users to seamlessly bridge and explore large content collections consisting of structured, unstructured, or both
types data—from all kinds of sources: content management, digital asset management, and other enterprise systems;
relational databases; file servers; websites; intranets; and portals. Endeca technology also supports more than 350 file
formats and 250 languages.

But searching structure is not enough; users must be able to navigate structure to leverage its full value. However, data-
bases and search engines are optimized for either structured data or unstructured data and miss the full value in bridging
the two. The Endeca Information Access Platform captures the most valuable aspect of structure: navigating relationships
between records. In a patent-pending process called “meta-relational indexing,” the Endeca Navigation Engine builds out
all the latent connections between structured and unstructured elements in the data. This indexing process enables it to
handle sources with differing metadata and taxonomies as well as unstructured data. As a result, customers can find what
they’re looking for because they’re searching within a relevant context, and sites eliminate costly labor expenses typically
associated with the taxonomy and content management process.

2.2. Advanced Search Features

Endeca incorporates best-of-breed search functionality to help users quickly and easily find the information they need.
Unlike other search solutions, it gives better results by analyzing information in context and leveraging structured, un-
structured, and relational information to give users the most meaningful results. Specifically, it provides:

2.2.1. Integrated search and Guided Navigation

Traditional enterprise search applications create artificial distinctions between search and navigation and structured and
unstructured information because they are designed around legacy technology limitations. Endeca is the first solution to
fully integrate search and navigation, giving users the speed and power to search—and bridge—structured and unstruc-
tured information in their searches.

        • Guided Navigation: Analyzing search logs reveals that users typically enter broad one or two word queries for the
          vast majority of searches, leading to a uselessly long list of results. Guided Navigation solves this pervasive problem
          by instantly returning the results of all searches in a precise navigation context that shows users all the valid ways
          to refine and explore further. The navigation context exposes and organizes structure associated with search results
          in a meaningful way to help users find information.

        • Combination of navigation category and full-text matches: Search queries are resolved against both structured
          navigation categories (which link to more relevant results) and full-text fields (which return a more extensive set
          of results). For example, a search for “Florists” in a directory application returns a category match like “Personal
          Services > Florists,” navigation categories such as “Events & Occasion,” and navigation refinements such as “Fu-
          nerals,” as well as a ranked list of businesses that are most relevant to the word “Florists,” In an online publishing
          application, a search for “Iraq” returns a category match like “International Relations > Iraq,” navigation categories
          such as “Publication Year,” and navigation refinements such as “2005,”,as well as a ranked list of articles with the
          word “Iraq” in the title, author, body, or other critical fields.




                                                                                                                                                                                               8

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
2.2.2. Sharp answers to fuzzy questions

Typical search engines respond with “no results found” to roughly 25% of queries, without giving users any confidence that
the system even understood their query. This happens because users have no way to know the exact spelling, syntax, or
word choices used in the underlying data. Endeca’s variant search uses linguistic analysis and the following techniques
to fix many of these near misses, relieving the user of the burden of having to know the precise terminology of the data
before they can ask useful questions:

        • Spell correction: Endeca’s smart algorithms combine phonetic analysis on search terms and underlying data to
          correct misspellings and detect alternate spellings. This patent-pending technology is based on the data in the par-
          ticular data set, removing the need to build and maintain a custom dictionary. Yet companies can tune the phonetic
          spelling corrector to make trade-offs between search precision (i.e., getting only the exact or very close results)
          and search recall (i.e., returning more results to ensure that data relevant to the user’s search isn’t missed).

        • Word stemming: Linguistic analysis of data finds word form variations including plurals, prefixes, suffixes, and
          conjugations.

        • Bi-directional thesaurus and synonyms: Customized thesauri and synonyms are implemented at both the naviga-
          tion category and full-text level. For example, a user’s query for “sushi” in a restaurant directory can be expanded
          to return the navigation category “Cuisine > Japanese” and/or all items with the word “Japanese cuisine” in their
          text description. Moreover, Endeca technology can perform asymmetrical synonyms matches, in which a search for
          “Iraq” would also return articles containing the keywords “Baghdad” and “Saddam Hussein,” but a search for “Sad-
          dam Hussein” may not return all articles with the keyword “Iraq.” What’s more, synonyms can be maintained over
          time with simple GUI tools, and regular search logs can help identify new terms to add to the thesaurus and list of
          synonyms.

        • Relevance ranking: Endeca’s unmatched, highly configurable relevancy ranking makes sure that the right results
          are at the top of the list. Endeca offers a variety of relevancy ranking modules that take into account a broad range
          of factors including term frequency, word positions and proximity, document date, document popularity, what field
          the term occurs in – and many other characteristics. These modules can be flexibly tuned and combined to execute
          sophisticated, customized search strategies that optimize information retrieval in the context of a specific applica-
          tion – rather than just offering a black-box approach to relevance like many competing solutions. Developers can
          even combine modules in different ways to create different search strategies within one application. For example,
          the relevancy ranking can change depending on which specific set of documents a user is searching, which specific
          part of the application a user is searching, or even which user is searching.

2.2.3. Adding structure to unstructured content

Endeca is a leader in extracting and exploiting structure from semi-structured or unstructured data.4 This occurs during
its data transformation and indexing processes by a number of methods:

        • Entity extraction: Endeca automatically extracts entities – people, places, and organizations – found in unstructured
          documents based on a variety of natural language processing techniques and statistical inference. In addition, the
          extraction process is self-training. Once a new type of entity is extracted in a number of documents – for example,
          product names—Endeca subsequently automatically extracts product names as metadata during the indexing pro-
          cess.

        • Inherent metadata: Endeca can extract the metadata – data about documents such as their date of creation, file
          type, and file size—from more than 370 file types, including documents with no inherent structure such as Word and
          PDF files. This valuable information is then used by Endeca’s Guided Navigation and search features for informa-
          tion access and retrieval. This capability is particularly powerful in cases where documents have some consistent
          metadata – for example, in content management systems—and is critical for unstructured data.

        • Contextual metadata: Endeca can also extract and leverage existing information about records held in a file system.
          For example the file structure, including elements of the file path, can be parsed and added to the record as meta-
          data. A document containing information about a company’s next product release may be found using a file path
          such as “Product Management > 2005 Product Releases > Product Release 2.0.” This information can be used as
          for making search refinements through Endeca’s Guided Navigation capabilities. In cases where file structures are
          very hierarchical, this process can add several layers of metadata.


                                                                                                                                                                                               9

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
• Concept extraction: Endeca offers the ability to extract key concepts from unstructured data via existing or im-
          ported, pre-built thesauri involving hundreds of industry-standard taxonomies in dozens of subject domains and
          languages. These thesauri also expand queries to include related terms.

        • Rules-based tagging: Endeca can use rules to add still more tags to documents during its process of acquiring con-
          tent from original sources. Rules can be as simple as tagging all documents containing the text “MSFT” or “Micro-
          soft” with <Microsoft> or as sophisticated as employing Boolean syntax and developing a rule stating <if X AND Y>
          and <date=June03> add <TAG> for records from June 3 that include both X and Y. To facilitate implementing rules-
          based tagging, Endeca leverages industry-standard thesauri, taxonomies, and controlled vocabularies.

2.2.4. Targeted searching

Users have a powerful but easy-to-use suite of functionality to hone the recall and relevance of their results:

        • Search within results: Users can refine their search process by launching iterative searches against their results.
          (They can also refine results with Guided Navigation.)

        • Parametric search: A parametric search interface gives users the option to simultaneously filter by ranges of
          information along multiple navigation dimensions. The parametric search options dynamically update as the user
          selects refinements, so that the user will never reach a dead-end. He or she will only have the ability to select a
          combination of refinements that lead to actual, relevant, results.

        • Dynamic concept discovery: Endeca offers users the ability to refine results by concept clusters. For example, a
          search for “eagles” will return thousands of relevant articles in an online publishing application. Endeca’s technol-
          ogy will then help users refine the results to get to the article they’re looking for by presenting clusters of articles
          relating to unique but relevant key concepts – for example, the sports team (Philadelphia Eagles), the band (Eagles),
          and the birds.

        • Automatic phrasing: Automatic phrasing: Endeca treats a series of words – for example, “Tom Cruise”—as a single
          phrase, improving the relevancy of results. For example, in this case it might be set to only return documents where
          “Tom” and “Cruise” are adjacent, greatly enhancing the precision of results. Endeca can also offer users the oppor-
          tunity to opt in or opt out of the phrasing.

2.3. Content Spotlighting

Content Spotlighting is an out-of-the-box capability for highlighting specific, relevant content on-screen as well as gener-
ally grouping or arranging search results – based on defined business rules. Frequently used in merchandising for cross-
selling and up-selling, it can also be used to disclose popular or richer content related to a query or for targeted adver-
tising. For example, if a user is searching for articles on the “Red Sox” in an online publishing application, the business
owner could use Content Spotlighting to highlight premium content that is only available on their site like live highlight
videos, player statistics, or articles from featured sports columnists. If a user is searching for a high-paying nursing job in
the Buckhead area of Atlanta on an online job site, the business owner can use Content Spotlighting to offer its advertisers
(hospitals) the opportunity to buy highly targeted advertising inventory (for example, on web pages with content on nurs-
ing, high salary range, Buckhead) instead of just the category “Nursing”.

Integrated with search and Guided Navigation, Content Spotlighting is data-driven, interactively responding to users’
search activity – as specified by the business rules. It can be triggered by search terms or Guided Navigation choices. It
can also be triggered by user profile information. During a query, rules are dynamically selected to provide users with the
most relevant content possible – i.e. content related to both what they are looking for and to the user’s profile (for ex-
ample, demographics, click behavior, etc.). This capability represents an advanced feature that other search technologies
can’t provide dynamically and at scale.

As a result of these features, Content Spotlighting significantly helps users find what they are looking for and, more im-
portant, frequently enables them to discover information and content that they didn’t know existed. It also enables com-
panies to promote premium or featured content and highly relevant and targeted advertisements. In this way, it boosts
search effectiveness and efficiency and creates a very compelling user experience. Business owners can use Content
Spotlighting to highlight the premium content that’s available on their site (and only their site) and help users see the
value in the paid subscription or registration. This contributes to greater customer satisfaction and loyalty and creates site
“stickiness” and repeat usage.


                                                                                                                                                                                               10

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
World Book makes it possible to search content of all types, supplementing articles in multiple languages with rich media include videos, audio clips,
                        photos, and structured tables -- from multiple content repositories.



Content Spotlighting is also easy to implement and manage – even for complex content collections. Business users – with-
out IT help – can easily define the rules that drive Content Spotlighting placements using an intuitive, web-based Endeca
interface designed specifically for their needs, versus the needs of the IT department. Once the rules are implemented,
they are updated dynamically, and changing the parameters is easy. As a result, the need to use costly IT resources for
these tasks is eliminated, and business managers spend less time managing the placements – decreasing costs overall.

2.4. Additional Platform Features

In addition to supplying users with unique technology that promotes search success and increased site activity, the Endeca
platform is designed for ease of implementation and maintenance, lowering the burden on IT resources and providing
companies with a successful information and retrieval solution with a low total cost of ownership.

2.4.1. Single interface to multiple data sources

As mentioned, content often originates in separate data stores or includes various document formats and structured data
schemas. The Endeca Information Access Platform crosses these boundaries to give users a seamless and single access
point to all data, regardless of its origin. A search might transparently cross, for example, image files, XML files, and PDFs
because Endeca supports:

        • Multiple formats: Endeca can search the most popular document formats including PDFs, Word docs, HTML, and
          many more (over 350 different file types). Likewise, structured data might originate in an RDBMS, XML database, or
          many other sources.

        • Multiple data sources: Data can originate in separate silos, and users can search all sources from a single inter-
          face.

        • Permissions: Individuals and groups can gain access to subsets of data based on their login ID. Guided Navigation
          options always perfectly reflect only the valid choices available to a specific user, giving everyone a customized
          view.



                                                                                                                                                                                               11

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
2.4.2. Open architecture

The Endeca Information Access Platform extracts and integrates data from multiple disparate sources including relational
databases, file servers, web sources (XML files), and content management systems and other packaged applications. It in-
tegrates with diverse sources systems via packaged adapters and APIs to transfer data by a range of approaches, includ-
ing data extracts; adapters, web crawlers, file server crawlers, and its own SDK – the Endeca Content Acquisition Develop-
ers’ Kit for building custom adapters.

2.4.3. A high-performing, low-cost infrastructure

The Endeca Information Access Platform provides a powerful solution at a low total cost of ownership, based on the fol-
lowing features and capabilities:

        • A standards-based architecture: Endeca integrates easily into the enterprise infrastructure. At the data level,
          Endeca has been designed to work with content of all kinds of systems and formats. It also integrates easily with
          other applications via a rich set of APIs. This flexibility makes the Endeca platform easy to deploy and allows com-
          panies to leverage their existing architecture.

        • Easy scaling: Because Endeca is built on a distributed platform, it scales easily for both increasing data volumes
          and site traffic while maintaining fast search performance – just by adding inexpensive, commodity servers.

        • High performance: Endeca provides sub-second response times to queries because its meta-relational indexing
          makes highly aggressive use of memory, multi-threading, index compression techniques, and cache engineering.
          This speed enhances the user experience, contributing to customer loyalty.


3. ENDECA ROI

Because of its innovative technology, the Endeca Information Access Platform meets the challenges of finding the right
information in complex content collections. Furthermore, media and publishing leaders have found that Endeca solutions
are quick and easy to deploy and maintain, and are enthusiastically adopted by broad audiences of information seekers. As
a result, they produce early and continuing ROI in several areas.

3.1. Improved Customer Retention and Acquisition

The Endeca Information Access platform offers users a powerful, intuitive user experience that highlights premium
content and differentiates the site from other commodity content sites – promoting customer satisfaction and, ultimately,
customer retention and acquisition. Endeca’s fast and easy indexing gets content on-site quickly and cost-effectively – en-
suring media and publishing companies have rich, up-to-date, content that their competitors lack.

Because of Endeca’s powerful Guided Navigation, advanced search, and Content Spotlighting capabilities, customers can
easily find the premium content they seek and can even discover previously unknown but relevant information. Features
like Endeca’s intuitive interface, configurable relevancy ranking, and scalability also enhance the customer experience and
ensure a search proceeds to the right result quickly and easily. For example, Endeca enabled World Book to increase the
speed of its search eight-to-ten times over its previous technology while offering richer search results (i.e., images, maps,
etc.) relating to the subject being researched.

In addition, Endeca’s reporting tools provide sites with information on usage and trends, like popular search terms, docu-
ments, or images. This information allows site developers to fine-tune features like relevancy ranking, thesauri, and Con-
tent Spotlighting to further enhance search success and direct customers to desirable and relevant content.

As a result of this positive search experience, customers spend more time exploring the site and finding even more rel-
evant information. They also return to the site with increasing loyalty – and create a positive buzz. This word-of-mouth, in
turn, results in growing brand recognition and easier customer acquisition.

Customer results tell the story best. With Endeca solutions:

        • Calls to customer support at Nando Media (a McClatchy Company) dropped by 15-20% because customers found
          what they wanted by themselves.


                                                                                                                                                                                               12

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
• World Book increased speed of search by 8-10x while providing richer search results.

        • 78% of users of a leading classifieds directory preferred the new site over the old experience; 72% of those users
          pointed to Guided Navigation & the Endeca breadcrumb as the driver of loyalty.

3.2. Increased Revenues

Just as Endeca’s superior user experience helps to improve customer retention and acquisition; it also leads to higher
revenues. For example, after a site upgrade featuring a new Endeca solution, World Book increased its sales by 20%.

With Endeca, revenue benefits can occur in several other areas. Most relevant to media and publishing companies, En-
deca can help increase site activity, increase advertising revenues, and increase subscriptions and registrations.

3.2.1. Transaction revenues.

As companies get more of their premium information online easily and cost-effectively with Endeca’s indexing, and more
customers find what they are looking for via advanced search and Guided Navigation, conversion rates rise, leading to
increased revenues. Satisfied customers return to the site to look for more research reports or case studies, for example,
and the number of purchases per unique visitor increases. As a result, Endeca customers have seen margins and overall
transactional revenues grow significantly.

3.2.2. Advertising revenues.

As the number of customers and page views increase, this improvement in site traffic and traffic quality directly impacts
ad revenues – attracting advertisers to the site and creating additional advertising inventory available for advertisers to
buy. In addition, with more pages accessed – especially with visitors accessing different pages and exploring the content
more deeply so that more pages are visited -- there is more relevant and high-quality ad inventory to sell, and that ad
inventory commands a higher price.

Just as important, the increase in site visits (from repeat and new customers) also improves the likelihood of higher
click-through rates and a larger number of ad impressions (CPM and CPC rates) – especially because Content Spotlight-
ing allows sites to target ads to pre-qualified customers based on their search and navigation paths. The result is more
revenue generated per page view and per advertiser. For example, a leading newspaper publisher in the UK saw a stun-
ning increase of 20% in page views and 40% in click-through rates.

3.3.3. Subscription and registration revenues

With all of the free content sites available today (for example, search engines, blogs, and content aggregators), it is dif-
ficult to justify subscription fees or even free registrations to your potential customer base. The most important way (and,
ironically, the easiest way!) to show the value of the subscription fee is by improving the search experience, so that users
can find that premium content that’s available only on your site. If users can’t find the content that really makes up the
value of the subscription fee, there’s no way they’ll pay for access, and they won’t even take the time to register to access
the content. In 2005, InfoCommerce Group reported that companies with subscription-based services lose 15-20% of their
subscriber base each year because they couldn’t find the information they were looking for, even though it actually did ex-
ist. Additionally, 25% of paid registrants log into the service once, find that the experience is difficult and frustrating, and
never log in again. Obviously that same 25% don’t renew their subscriptions.5 Endeca’s integrated search, Guided Naviga-
tion, and Content Spotlighting capabilities give companies the ability to highlight valuable content and users the ability to
find valuable content. As a result, several of Endeca’s customers have seen increases in subscriptions and registrations
as their users quickly see the importance of their content versus the free content sites.

3.3.4. Licensing revenues.

Once again, because Endeca easily enables users to navigate through content, find what they are looking for, and discover
new content, site traffic increases. As a result, distributors and publishers are willing to pay higher licensing fees for ac-
cess to premium data because they can see the value of the content and consistently find the specific piece of content they
need to support their own businesses. For example, advertising agencies or news publishers are more likely to be willing
to pay a higher licensing fee to a stock photography site if they have an easy and fast way to find and purchase the photos
that they need for the print ad or article that will be released in tomorrow’s edition of the daily newspaper.



                                                                                                                                                                                               13

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
3.3. Lower Total Cost of Ownership

The easy-to-use, open technology of the Endeca Information Access Platform decreases total cost of ownership. While
traditional search technologies with rigid schemas and taxonomies require intensive IT efforts to deploy and update and
high-cost hardware to run queries, Endeca’s special approach to indexing and GUI-driven system tools allows for rapid
system deployment and maintenance, including data cleansing and updates. Implementing and maintaining an Endeca
solution is easier, less time-consuming, and, therefore, less costly – resulting in early ROI. For example, leading informa-
tion provider IHS cut millions of dollars in IT labor costs over five years.

Furthermore, Endeca solutions run on commodity hardware, reducing the hardware expenses of traditional search. They
also scale economically as more data and users are added to the system – just by adding commodity servers.


4. CONCLUSION

The Endeca Information Access Platform brings new information retrieval functionality – and significant financial and
competitive benefits – to media and publishing companies. Built on innovative Endeca Guided Navigation® technology, it
overcomes obstacles to retrieving complex information and exposes relevant content to users.

With access to this information through an easy-to-use interface and an intuitive, productive approach to navigating infor-
mation, customers find what they are looking for and discover other relevant content. This successful search and browse
experience encourages them to explore the site, viewing more pages and often purchasing or downloading more informa-
tion per visit, as well as to return to the site. As a result, revenues –from transactions, subscriptions and registrations,
ads, and licensing—grow. And because Endeca technology is easy and cost-effective to use, deploy, and maintain, compa-
nies lower their total cost of ownership.

In other words, from its initial deployment and throughout its daily use, the Endeca Information Access Platform increases
profits, lowers costs, and improves customer satisfaction – providing a competitive advantage. These advantages make it
an economical—and critical – infrastructure application for media and publishing companies.


5. FOOTNOTES
1   Outlook, 2005
2   IDC, 2004
3   Research on this topic includes:

        • Nicholas J. Belkin. School of Communication, Information and Library Studies at Rutgers University. An overview of
          his work can be found at http://mariner.rutgers.edu/tipster /cladp97.html

        • Scott Card and Peter Pirolli. Information Foraging Theory. www2.parc.com/istl/projects/uir/pubs/ items/UIR-1999-
          05-Pirolli-Report-InfoForaging.pdf

        • Jared Spool. User Interface Engineering Report. http://www.uie.com/articles/three_click_rule/

        • Don Norman. The Design of Everyday Things, (Currency, 1990).
4   Forrester Research, “The Future of Enterprise Search,” 2003.
5   InfoCommerce 2005, The Conference for Data Publishers, November 6-8, 2005




                                                                                                                                                                                               14

© 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of
Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.

Mais conteúdo relacionado

Destaque

Oracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentOracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentNitai Partners Inc
 
EBS-endeca-technical-considerations
EBS-endeca-technical-considerationsEBS-endeca-technical-considerations
EBS-endeca-technical-considerationsBerry Clemens
 
Enterprise asset management analytics
Enterprise asset management analyticsEnterprise asset management analytics
Enterprise asset management analyticsNitai Partners Inc
 
ETIS10 - BI Business Requirements - Presentation
ETIS10 - BI Business Requirements - PresentationETIS10 - BI Business Requirements - Presentation
ETIS10 - BI Business Requirements - PresentationDavid Walker
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse componentsganblues
 
Capturing Business Requirements For Scorecards, Dashboards And Reports
Capturing Business Requirements For Scorecards, Dashboards And ReportsCapturing Business Requirements For Scorecards, Dashboards And Reports
Capturing Business Requirements For Scorecards, Dashboards And ReportsJulian Rains
 
Oracle Commerce Using ATG & Endeca - Do It Yourself Series
Oracle Commerce Using ATG & Endeca - Do It Yourself SeriesOracle Commerce Using ATG & Endeca - Do It Yourself Series
Oracle Commerce Using ATG & Endeca - Do It Yourself SeriesKeyur Shah
 
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball ApproachMicrosoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball ApproachMark Ginnebaugh
 
Sample - Data Warehouse Requirements
Sample -  Data Warehouse RequirementsSample -  Data Warehouse Requirements
Sample - Data Warehouse RequirementsDavid Walker
 
Capturing Data Requirements
Capturing Data RequirementsCapturing Data Requirements
Capturing Data Requirementsmcomtraining
 
Gathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business RequirementsGathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business RequirementsWynyard Group
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements TemplateAlan D. Duncan
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesDavid Walker
 
White Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project ManagementWhite Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project ManagementDavid Walker
 

Destaque (16)

Oracle business analytics and endeca approach Document
Oracle business analytics and endeca approach DocumentOracle business analytics and endeca approach Document
Oracle business analytics and endeca approach Document
 
Endeca B2B Summit 2011
Endeca B2B Summit 2011Endeca B2B Summit 2011
Endeca B2B Summit 2011
 
EBS-endeca-technical-considerations
EBS-endeca-technical-considerationsEBS-endeca-technical-considerations
EBS-endeca-technical-considerations
 
Enterprise asset management analytics
Enterprise asset management analyticsEnterprise asset management analytics
Enterprise asset management analytics
 
ETIS10 - BI Business Requirements - Presentation
ETIS10 - BI Business Requirements - PresentationETIS10 - BI Business Requirements - Presentation
ETIS10 - BI Business Requirements - Presentation
 
Warehouse components
Warehouse componentsWarehouse components
Warehouse components
 
Capturing Business Requirements For Scorecards, Dashboards And Reports
Capturing Business Requirements For Scorecards, Dashboards And ReportsCapturing Business Requirements For Scorecards, Dashboards And Reports
Capturing Business Requirements For Scorecards, Dashboards And Reports
 
Oracle Commerce Using ATG & Endeca - Do It Yourself Series
Oracle Commerce Using ATG & Endeca - Do It Yourself SeriesOracle Commerce Using ATG & Endeca - Do It Yourself Series
Oracle Commerce Using ATG & Endeca - Do It Yourself Series
 
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball ApproachMicrosoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
Microsoft Data Warehouse Business Intelligence Lifecycle - The Kimball Approach
 
BI Business Requirements - A Framework For Business Analysts
BI Business Requirements -  A Framework For Business AnalystsBI Business Requirements -  A Framework For Business Analysts
BI Business Requirements - A Framework For Business Analysts
 
Sample - Data Warehouse Requirements
Sample -  Data Warehouse RequirementsSample -  Data Warehouse Requirements
Sample - Data Warehouse Requirements
 
Capturing Data Requirements
Capturing Data RequirementsCapturing Data Requirements
Capturing Data Requirements
 
Gathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business RequirementsGathering And Documenting Your Bi Business Requirements
Gathering And Documenting Your Bi Business Requirements
 
07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template07. Analytics & Reporting Requirements Template
07. Analytics & Reporting Requirements Template
 
Gathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data WarehousesGathering Business Requirements for Data Warehouses
Gathering Business Requirements for Data Warehouses
 
White Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project ManagementWhite Paper - Data Warehouse Project Management
White Paper - Data Warehouse Project Management
 

Semelhante a Endeca business white paper for media and publishing

Digital Indi Challenges Of Data Mining Essay
Digital Indi Challenges Of Data Mining EssayDigital Indi Challenges Of Data Mining Essay
Digital Indi Challenges Of Data Mining EssayAshley Jean
 
Simplifying complexity of Digital journey
Simplifying complexity of Digital journey Simplifying complexity of Digital journey
Simplifying complexity of Digital journey AgileNetwork
 
How To Avoid Becoming A Dot Bomb 2001
How To Avoid Becoming A Dot Bomb 2001How To Avoid Becoming A Dot Bomb 2001
How To Avoid Becoming A Dot Bomb 2001Julian Curtiss
 
Successful Website Redesign
Successful Website RedesignSuccessful Website Redesign
Successful Website RedesigntheBATstudio
 
Lower downtime and timely access to lending applications through process impr...
Lower downtime and timely access to lending applications through process impr...Lower downtime and timely access to lending applications through process impr...
Lower downtime and timely access to lending applications through process impr...Mindtree Ltd.
 
Ttb eloqua slides stein ias v0.5 (final shared version)
Ttb eloqua slides stein ias v0.5 (final   shared version)Ttb eloqua slides stein ias v0.5 (final   shared version)
Ttb eloqua slides stein ias v0.5 (final shared version)Marc Keating
 
Agile Mumbai 2022 - Kartik Dhokaai | AI Power Search
Agile Mumbai 2022 - Kartik Dhokaai | AI Power SearchAgile Mumbai 2022 - Kartik Dhokaai | AI Power Search
Agile Mumbai 2022 - Kartik Dhokaai | AI Power SearchAgileNetwork
 
i.Realities Corporate Profile
i.Realities Corporate Profilei.Realities Corporate Profile
i.Realities Corporate Profileirealities
 
The new patterns of innovation
The new patterns of innovationThe new patterns of innovation
The new patterns of innovationVaibhav Pitliya
 
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITCIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITDenodo
 
online reporting
online reportingonline reporting
online reportingsoniakundra
 
Bab 8 (managing it services delivery)
Bab 8 (managing it services delivery)Bab 8 (managing it services delivery)
Bab 8 (managing it services delivery)Siti Mustiani
 
Data driven approaches in a technology startup
Data driven approaches in a technology startupData driven approaches in a technology startup
Data driven approaches in a technology startupRakuten Group, Inc.
 
Impact of Consumer Broadband on Internet Infrastructure
Impact of Consumer Broadband on Internet InfrastructureImpact of Consumer Broadband on Internet Infrastructure
Impact of Consumer Broadband on Internet InfrastructureSteve Keifer
 
The new patterns of innovation
The new patterns of innovationThe new patterns of innovation
The new patterns of innovationkrushali98
 
Marketing Chapter 07
Marketing Chapter 07Marketing Chapter 07
Marketing Chapter 07WanBK Leo
 
2022 electronics media_kit
2022 electronics media_kit2022 electronics media_kit
2022 electronics media_kitChristianJHaight
 

Semelhante a Endeca business white paper for media and publishing (20)

Digital Indi Challenges Of Data Mining Essay
Digital Indi Challenges Of Data Mining EssayDigital Indi Challenges Of Data Mining Essay
Digital Indi Challenges Of Data Mining Essay
 
Simplifying complexity of Digital journey
Simplifying complexity of Digital journey Simplifying complexity of Digital journey
Simplifying complexity of Digital journey
 
How To Avoid Becoming A Dot Bomb 2001
How To Avoid Becoming A Dot Bomb 2001How To Avoid Becoming A Dot Bomb 2001
How To Avoid Becoming A Dot Bomb 2001
 
Whitepaper ds roi_en
Whitepaper ds roi_enWhitepaper ds roi_en
Whitepaper ds roi_en
 
Successful Website Redesign
Successful Website RedesignSuccessful Website Redesign
Successful Website Redesign
 
Enterprises2.0
Enterprises2.0Enterprises2.0
Enterprises2.0
 
Lower downtime and timely access to lending applications through process impr...
Lower downtime and timely access to lending applications through process impr...Lower downtime and timely access to lending applications through process impr...
Lower downtime and timely access to lending applications through process impr...
 
Ttb eloqua slides stein ias v0.5 (final shared version)
Ttb eloqua slides stein ias v0.5 (final   shared version)Ttb eloqua slides stein ias v0.5 (final   shared version)
Ttb eloqua slides stein ias v0.5 (final shared version)
 
Web serviceswhitepaper
Web serviceswhitepaperWeb serviceswhitepaper
Web serviceswhitepaper
 
Agile Mumbai 2022 - Kartik Dhokaai | AI Power Search
Agile Mumbai 2022 - Kartik Dhokaai | AI Power SearchAgile Mumbai 2022 - Kartik Dhokaai | AI Power Search
Agile Mumbai 2022 - Kartik Dhokaai | AI Power Search
 
i.Realities Corporate Profile
i.Realities Corporate Profilei.Realities Corporate Profile
i.Realities Corporate Profile
 
The new patterns of innovation
The new patterns of innovationThe new patterns of innovation
The new patterns of innovation
 
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the ITCIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
CIO priorities and Data Virtualization: Balancing the Yin and Yang of the IT
 
online reporting
online reportingonline reporting
online reporting
 
Bab 8 (managing it services delivery)
Bab 8 (managing it services delivery)Bab 8 (managing it services delivery)
Bab 8 (managing it services delivery)
 
Data driven approaches in a technology startup
Data driven approaches in a technology startupData driven approaches in a technology startup
Data driven approaches in a technology startup
 
Impact of Consumer Broadband on Internet Infrastructure
Impact of Consumer Broadband on Internet InfrastructureImpact of Consumer Broadband on Internet Infrastructure
Impact of Consumer Broadband on Internet Infrastructure
 
The new patterns of innovation
The new patterns of innovationThe new patterns of innovation
The new patterns of innovation
 
Marketing Chapter 07
Marketing Chapter 07Marketing Chapter 07
Marketing Chapter 07
 
2022 electronics media_kit
2022 electronics media_kit2022 electronics media_kit
2022 electronics media_kit
 

Endeca business white paper for media and publishing

  • 1. Information Access Solutions for Media and Publishing Endeca Business White Paper ENDECA 55 Cambridge Parkway Cambridge, MA. 02142 Telephone 617.577.7999
  • 2. Information Access Solutions for Media and Publishing Endeca Business White Paper TABLE OF CONTENTS 1. INFORMATION ACCESS AND RETRIEVAL CHALLENGES 3 1.1. Introduction 1.2. The Negative Business Impact 1.2.1. Poor customer acquisition and retention 1.2.2. Lost revenues 1.2.3. High technology costs, high content management costs 1.3. Technology Obstacles to Traditional Information Access and Retrieval 1.3.1. Search behavior: human-centered design 1.3.2. Why traditional search technologies often fail 1.3.3. Managing complex content 2. THE ENDECA PLATFORM – MAXIMIZING INFORMATION ACCESS AND RETRIEVAL OF ALL KINDS OF DATA 6 2.1. Guided Navigation – A Breakthrough Technology 2.1.1. Faceted navigation overcomes the limits of taxonomy solutions 2.1.2. An intuitive, easy-to-use interface 2.1.3. The power to search both structured and unstructured data 2.2. Advanced Search Features 2.2.1. Integrated search and Guided Navigation 2.2.2. Sharp answers to fuzzy questions 2.2.3. Adding structure to unstructured content 2.2.4. Targeted searching 2.3. Content Spotlighting 2.4. Additional Platform Features 2.4.1. Single interface to multiple data sources 2.4.2. Open architecture 2.4.3. A high-performing, low-cost infrastructure 3. ENDECA ROI 12 3.1. Improved Customer Retention and Acquisition 3.2. Increased Revenues 3.2.1. Transaction revenues. 3.2.2. Advertising revenues. 3.3.3. Subscription and registration revenues 3.3.4. Licensing revenues. 3.3. Lower Total Cost of Ownership 4. CONCLUSION 14 5. FOOTNOTES 14 © 2005 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 3. 1. INFORMATION ACCESS AND RETRIEVAL CHALLENGES 1.1. Introduction Information providers of all types, including directories, news and magazine publishers, and multimedia content suppli- ers, continue to make substantial investments in their online business. This market is still growing quickly, as consumers spend a larger percentage of their time online, and revenue dollars follow them. For example, online ad spending grew 28.6% in the second quarter of 2005 while newspaper print ads grew only 1.9% in the same period.1 To take best advantage of this opportunity, traditional media and publishing companies have diversified into online delivery – investing heavily in popular web technologies and IT resources. But this is only half the battle. The fight for market and wallet share on the web is equally fierce. New, free online media – web search engines, portals, and blogs – are gain- ing momentum in the traditional media space. In 2004, 37% of households used the free web as their only information source.2 This free content puts downward pressure on margins – and turns content into a commodity. To win online, media and publishing companies must differentiate themselves by offering not only premium content, but also a better user experience. But getting content online – and, more important, making it easily accessible – isn’t simple. Backlogs of proprietary information are often huge, and content and data sources are proliferating almost exponentially. Even once this information is online, it usually isn’t easy to find. Users input a query and often get a million results, or worse -- get no results. Then they have no meaningful way to browse further to find what they are looking for, aside from taking another shot in the dark. This is the same frustrating search experience they have on the free web. But for companies in the business of providing information online – which have to compete with the free web – successful information access is paramount. Poor search has a serious negative business impact on customer retention and acquisi- tion and, consequently, revenues. Information access failures can affect short-term and long-term profitability. What can companies do about this problem, and why should they do it, considering their already significant IT invest- ments? Here’s a closer look at how these search failures negatively affect the business, why they occur, and what new solution can improve search success to help differentiate a site and provide competitive advantage. 1.2. The Negative Business Impact Over time, information access and retrieval difficulties produce negative business consequences in three areas – customer acquisition and retention, revenue, and profitability. 1.2.1. Poor customer acquisition and retention With an increasing array of information sources available—including commodity search engines like Google and Yahoo and content aggregators such as wikis and blogs– premium content providers must first attract customers away from these free resources, and then ensure that they stay on the site. Even if there is unique content on the site, users will quickly abandon it if they aren’t able to find that content. Customers will stay on the site and continue to come back to that site if they believe that there is valuable content available and they have an easy way to access it. These search failures can have a short- and long-term impact on the success of the business. In the short-term, search failures diminish customer satisfaction and loyalty and, if constantly repeated, result in failed relationships and lost cus- tomers. What’s more, if enough customers defect because of poor search and retrieval, negative word-of-mouth spreads. The ability to attract new customers is hampered, brand equity is destroyed, and market share begins to deteriorate. The company is suddenly at a significant competitive disadvantage. In short, just providing users with premium content isn’t enough to get a competitive edge if that content is not easily ac- cessible. A superior user experience – one that is better than the hit-or-miss searches of free Internet resources – is a necessary differentiator to encourage customer loyalty and site usage. 1.2.2. Lost revenues Information access failures also lead to lost revenues in several different areas, including subscription and licensing rev- enue, transaction or sales revenue, and advertising revenue. 3 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 4. Online publishers that rely on subscription fees as a primary revenue source need an effective information access solution in order to acquire and retain subscribers. If users can’t easily find relevant content, they won’t see the value in paying for a subscription and/or they won’t see the need in renewing a current subscription. As a result, overall revenues from the subscription service will decline. Multimedia content providers that depend on licensing revenue will also see a decline in business if their customers fail to renew their license agreements because they are unable to find the content that they are seeking. Competitors with better search experiences will steal these users. A primary source of revenue for some media and publishing companies may be from sales transactions – commonly re- ferred to as “pay-per-piece” transactions. For example, in addition to their subscription-based services, market research firms sell their reports individually, and some online publishers sell certain articles individually. Multimedia content providers may sell photos and other graphic images, audio files, and video files on a per-piece basis. Each failed search is a lost transaction. Poor search also has a subtler, but just as significant, negative impact on advertising revenues. Over the past few years, the focus for directories and information publishers has primarily been on getting their data from print to the online pub- lication. As a result, the search experience is poor, and users are not getting the value that they expect from these sites. Now add to this content access problem the fact that advertisers spend money on sites that have high page views and give them the ability to target ads to relevant customers. The result is twofold: low traffic, as users quickly abandon the site to look for an alternative source to find the information, and also low advertising revenue, as advertisers abandon the site because their click-through and conversion rates are low. To lure advertisers to their site, companies need not just a certain volume of traffic, but high-quality traffic, which can typically be measured by the number of unique visitors to the site and the number of page views per unique visitor or per session. These traffic figures speak directly to the volume and quality of the traffic, and, consequently, affect the advertis- ing revenues generated by the site. Publishers charge advertisers on a CPM (cost per thousand impressions) or CPC (cost per click) basis. If an advertiser believes that it is going to receive high-quality traffic (and a large volume of it), it will be willing to pay a higher CPM or CPC. Site traffic also affects the amount of the ad inventory. The fewer the page views, the less ad inventory there will be to sell to potential advertisers. In the worst-case scenario, advertisers fail to patronize the site at all because of the poten- tially poor traffic metrics. Consequently, and once again, search failures can have a negative impact on another important revenue stream for media and publishing companies – advertising revenue. A high-quality user experience, especially one that is providing easy access to sought-after content, can and will determine the amount of ad revenues generated in the short- and long-term. 1.2.3. High technology costs, high content management costs Most media and publishing companies have already made a large investment in hardware, software, and IT talent in order to implement their site and get their premium content online. Adding search capabilities often requires expensive, multi- ple servers to handle the volume of traffic and the added complexity of searching large volumes of data. What’s more, with most search technologies, maintaining and updating the content once it’s online can be equally expensive, especially if the site requires the creation and maintenance of hard-coded taxonomies. Other search engines may also require expensive hardware in order to add additional content or update the existing content because of the complexity of the software appli- cation. To ensure a high quality user experience, business user tools are incredibly important but are often non-existent, expensive, and/or difficult to use. When the number of users and the amount of revenues stagnate or decline, all of these costs together can result in a high total cost of ownership and, consequently, lower total profitability. These negative impacts are interrelated. Their common underlying problems lie in the inherent limitations of commonly used search technologies, especially for searching complex information collections involving multiple data sources that include both structured and unstructured data types. 1.3. Technology Obstacles to Traditional Information Access and Retrieval Many web design packages and other related software, such as content management systems, come equipped with search capabilities. Companies often choose to either leverage these existing systems for search or buy other traditional search technologies. These companies quickly discover that this traditional search functionality is limited in its ability to actually find the relevant information. This is particularly true if the data has both structured and unstructured charac- teristics, is large in volume and complexity, and resides in disparate repositories. A closer look at human search behavior and traditional search technologies shows why these limitations exist. 4 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 5. 1.3.1. Search behavior: human-centered design When searchers are having a hard time finding the right content, it’s not for lack of ingenuity; it’s for lack of the right tools to match their inventiveness and flexibility. While many search software companies have upgraded their technology to improve existing tools, the right approach lies in first studying human searching behavior to figure out which tools are the right tools to build. While such user-centered design is considered a best practice in other endeavors, it has been over- looked in the critical field of information access and retrieval – until now. Research conducted by top information scientists and user experience experts supports the notion that people looking for information follow consistent behaviors over a wide range of tasks. They follow a particular pattern of behavior as they initiate a task, which changes as they proceed through the task, and then as they either finish or abandon a task. In order to continue to make progress, they need a different set of tools at each step of the process. The appropriate approach, missed by traditional search technology, arms users with all the tools they need to find what they want. When users finally have the right tools, the tools themselves feel intuitive and transparent, creating a superior user experience and resulting in customer satisfaction and loyalty. In short, the path to business objectives lies in supporting user goals.3 1.3.2. Why traditional search technologies often fail Traditional search technologies each have their inherent limits: • Keyword search: The effectiveness of keyword (or full-text) search relies on users to predict what might constitute a good query, yet paradoxically, they don’t yet know enough about the content to know what to ask. Prediction fails because wrong guesses yield the extremes of “no results,” or too many answers, especially in response to broad keywords – and no helpful guidance on why their prediction failed. For example, searching on a generic term like “sports” in a large collection of news articles can generate thousands of results. Furthermore, relevance algo- rithms fail to put the most useful results at the top of long lists. If users don’t know precisely what to ask for as they initiate their search task and there is no effective way to narrow the list of results, users will quickly abandon the site and look for the information elsewhere. • Navigating taxonomies or fixed classifications: For information seekers to navigate fixed hierarchical taxonomies, they have to make predictions about where to find the content they want. Are all the possible articles about the war in Iraq under the “International” branch in “News”? Certainly not. There may also be articles about the war in Iraq under “Terrorism” in “Politics” branch. In other words, some information may be hidden because customers don’t know the “right” branch of the hierarchy to select. They need to choose the path that will lead them to the right content, which means they have to make the right decisions about which branch to choose at each decision point. If the search isn’t productive, there’s no way to know why, and there isn’t any way to adapt or iterate their behavior in order to make progress. Moreover, fixed taxonomies are expensive to maintain because successful search paths are hard coded and need to be changed as the data changes. The limitations of these different technologies are especially apparent in searching the complex content offered on these sites. 1.3.3. Managing complex content Media and publishing companies have several different types of data in various repositories and collections, each with its own access and retrieval challenges: • Directories primarily have highly structured data, typically located in databases that are frequently updated. If the directory combines data from several databases, there may be different metadata or taxonomies for each database, especially in cases where the data comes from external sources (for example, aggregating a number of regional Yellow Pages print directories). In addition, a directory site may also offer some unstructured data. For example, a job site might have multiple data repositories – for job postings, company profiles, and lists of employers – but also a collection of unstructured data in the form of resumes. Combining structured and unstructured data and combin- ing data from different repositories have been particularly challenging for traditional search technologies. This is due to the fact that most technologies were built on the assumption that all data would be in the same format and would be located in the same repository. • Online publishers primarily have unstructured data, i.e., long-form text documents. Looking through thousands of results – each with pages and pages of text—for the right piece of information is a tremendous challenge for users. 5 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 6. That’s why it’s critical for companies to supply readers with an intuitive experience that allows them to easily and quickly identify which document is the most relevant. Traditional search technologies do not have the capabilities to extract structure from unstructured data. Yet this structure is necessary to provide users with the context they need to make these refinement decisions. Some search technologies offer rigid taxonomies or categorization schemes, but that won’t suffice either for the reasons discussed above. In addition, most online publishers use sophisticated content management systems to store, tag, and publish the data. Consequently, it is essential that the search en- gine has the capability to extract data from these systems in order to allow for fast and flexible indexing. • Multimedia producers and suppliers have a completely unique type of data in the form of images, audio, and video – requiring flexible and powerful indexing and search capabilities. In these instances, it is even more important for the search technology to leverage the metadata associated with the content because that’s where most of the con- text for searching resides. This information may be held in a digital asset management system – requiring adaptors to extract it for data collection. If the common search technologies discussed above are employed in cases where complex data exists, information re- trieval challenges will persist– and information seekers will avoid using the applications whenever possible. The good news is that there is now a breakthrough solution designed to overcome the “million or none” results impasse of traditional search technology and to access and retrieve all kinds of data across diverse systems. All of this is possible without encountering high implementation and maintenance costs that are associated with many site implementations. 2. THE ENDECA PLATFORM – MAXIMIZING INFORMATION ACCESS AND RETRIEVAL OF ALL KINDS OF DATA Endeca provides solutions and best practices designed specifically for each kind of information provider—directories, online publishers, and multimedia suppliers. Underlying these three solutions is a common technology core: the Endeca Information Access Platform, which includes the Endeca Navigation Engine. This core technology overcomes the limita- tions of traditional search engines and addresses the data challenges facing media and publishing companies. Built on the Endeca Information Access Platform, Endeca solutions for directories, online publishers, and multimedia content providers offer a single, fast, easy, and effective way to search and browse large volumes of data in structured and unstructured formats – across all types of systems. Endeca solutions integrate search and navigation, providing the flex- ibility and control needed to allow users to search intuitively and effectively. These solutions also return the results of all searches in a precise navigation context that improves users’ future predictive search choices, give them relevant tools to adapt their search at each stage, and encourage meaningful search iteration and revision. At each stage of the search process, customers progress toward their goal, which means that they are staying on the site longer and returning to the site more often, resulting in an increase in site activity (for example, page views, click-through rates, session duration, etc.). And because they ultimately find what they want, they are satisfied and become long-term, loyal customers. The Endeca technology platform also offers a low total cost of ownership and is designed for ease of installation and use. Based on nine pending patents, the Endeca Information Access Platform includes the following features and capabilities, which make these customer and financial benefits possible. 2.1. Guided Navigation – A Breakthrough Technology The Endeca Information Access Platform includes the Endeca Navigation Engine, which executes innovative browsing technology called “Guided Navigation.” This helps users refine and explore relevant results to overcome the “million or none” obstacle, so they can quickly and easily find what they are looking for and even discover information they didn’t know existed. Specifically, Guided Navigation provides: 2.1.1. Faceted navigation overcomes the limits of taxonomy solutions In general, navigation helps users who are not familiar with data to ask smarter questions by exposing all the choices that are available to them. But Guided Navigation goes far beyond current browse solutions by making a new kind of navigation possible. Based on faceted navigation, a multi-dimensional approach advocated by information scientists as a far more efficient and easy-to-use way to find information than taxonomies, Guided Navigation: 6 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 7. • Creates hundreds of valid browse paths to each record, rather than just the few paths available in a taxonomy, tre- mendously increasing the likelihood that a user will find a record • Allows users to prioritize their choices in their own personalized way rather than forcing users down the arbitrary path of the taxonomist • Updates all navigation options at each click, showing users all the valid questions they can ask next and eliminating millions of possible deadened paths. • Integrates fully with search, making it possible to refine long lists of search results, and search navigation options. (See Section 2.2.1 below for additional details about integrated search and Guided Navigation) 2.1.2. An intuitive, easy-to-use interface Simply calculating which questions users can ask next is not enough to facilitate search success because there are usually thousands of choices. In fact, the best way to organize navigation options changes markedly as users narrow from a vast Guardian Unlimited has seen significant increase in search activity on the site as readers use Guided Navigation to browse and refine their search results. 7 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 8. space, down close to a result. Adapting to the changing situation, Guided Navigation intelligently reorganizes those options with each click in the most meaningful, relevant way – and presents those new choices in a clear on-screen list that shows users the next step. The result is a more effective interface, strongly preferred by end-users over traditional solutions, that provides easy ac- cess to the power and flexibility of Guided Navigation. Users see progress as they search and discover related information they didn’t know existed, so they remain on the site, exploring the data and finding relevant content. Because their search experience is meaningful and successful, they are satisfied and consistently return to the site. 2.1.3. The power to search both structured and unstructured data Although media and publishing companies have different data profiles, Endeca’s technology was built with inherent capa- bilities to meet their different needs. Endeca can handle a wide range of data formats: from unstructured documents with basic metadata or fielded information; to semi-structured customer data, product information, XML pages, and auto-clas- sified documents; to highly structured parametric data and databases. In fact, the Endeca Information Access Platform allows users to seamlessly bridge and explore large content collections consisting of structured, unstructured, or both types data—from all kinds of sources: content management, digital asset management, and other enterprise systems; relational databases; file servers; websites; intranets; and portals. Endeca technology also supports more than 350 file formats and 250 languages. But searching structure is not enough; users must be able to navigate structure to leverage its full value. However, data- bases and search engines are optimized for either structured data or unstructured data and miss the full value in bridging the two. The Endeca Information Access Platform captures the most valuable aspect of structure: navigating relationships between records. In a patent-pending process called “meta-relational indexing,” the Endeca Navigation Engine builds out all the latent connections between structured and unstructured elements in the data. This indexing process enables it to handle sources with differing metadata and taxonomies as well as unstructured data. As a result, customers can find what they’re looking for because they’re searching within a relevant context, and sites eliminate costly labor expenses typically associated with the taxonomy and content management process. 2.2. Advanced Search Features Endeca incorporates best-of-breed search functionality to help users quickly and easily find the information they need. Unlike other search solutions, it gives better results by analyzing information in context and leveraging structured, un- structured, and relational information to give users the most meaningful results. Specifically, it provides: 2.2.1. Integrated search and Guided Navigation Traditional enterprise search applications create artificial distinctions between search and navigation and structured and unstructured information because they are designed around legacy technology limitations. Endeca is the first solution to fully integrate search and navigation, giving users the speed and power to search—and bridge—structured and unstruc- tured information in their searches. • Guided Navigation: Analyzing search logs reveals that users typically enter broad one or two word queries for the vast majority of searches, leading to a uselessly long list of results. Guided Navigation solves this pervasive problem by instantly returning the results of all searches in a precise navigation context that shows users all the valid ways to refine and explore further. The navigation context exposes and organizes structure associated with search results in a meaningful way to help users find information. • Combination of navigation category and full-text matches: Search queries are resolved against both structured navigation categories (which link to more relevant results) and full-text fields (which return a more extensive set of results). For example, a search for “Florists” in a directory application returns a category match like “Personal Services > Florists,” navigation categories such as “Events & Occasion,” and navigation refinements such as “Fu- nerals,” as well as a ranked list of businesses that are most relevant to the word “Florists,” In an online publishing application, a search for “Iraq” returns a category match like “International Relations > Iraq,” navigation categories such as “Publication Year,” and navigation refinements such as “2005,”,as well as a ranked list of articles with the word “Iraq” in the title, author, body, or other critical fields. 8 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 9. 2.2.2. Sharp answers to fuzzy questions Typical search engines respond with “no results found” to roughly 25% of queries, without giving users any confidence that the system even understood their query. This happens because users have no way to know the exact spelling, syntax, or word choices used in the underlying data. Endeca’s variant search uses linguistic analysis and the following techniques to fix many of these near misses, relieving the user of the burden of having to know the precise terminology of the data before they can ask useful questions: • Spell correction: Endeca’s smart algorithms combine phonetic analysis on search terms and underlying data to correct misspellings and detect alternate spellings. This patent-pending technology is based on the data in the par- ticular data set, removing the need to build and maintain a custom dictionary. Yet companies can tune the phonetic spelling corrector to make trade-offs between search precision (i.e., getting only the exact or very close results) and search recall (i.e., returning more results to ensure that data relevant to the user’s search isn’t missed). • Word stemming: Linguistic analysis of data finds word form variations including plurals, prefixes, suffixes, and conjugations. • Bi-directional thesaurus and synonyms: Customized thesauri and synonyms are implemented at both the naviga- tion category and full-text level. For example, a user’s query for “sushi” in a restaurant directory can be expanded to return the navigation category “Cuisine > Japanese” and/or all items with the word “Japanese cuisine” in their text description. Moreover, Endeca technology can perform asymmetrical synonyms matches, in which a search for “Iraq” would also return articles containing the keywords “Baghdad” and “Saddam Hussein,” but a search for “Sad- dam Hussein” may not return all articles with the keyword “Iraq.” What’s more, synonyms can be maintained over time with simple GUI tools, and regular search logs can help identify new terms to add to the thesaurus and list of synonyms. • Relevance ranking: Endeca’s unmatched, highly configurable relevancy ranking makes sure that the right results are at the top of the list. Endeca offers a variety of relevancy ranking modules that take into account a broad range of factors including term frequency, word positions and proximity, document date, document popularity, what field the term occurs in – and many other characteristics. These modules can be flexibly tuned and combined to execute sophisticated, customized search strategies that optimize information retrieval in the context of a specific applica- tion – rather than just offering a black-box approach to relevance like many competing solutions. Developers can even combine modules in different ways to create different search strategies within one application. For example, the relevancy ranking can change depending on which specific set of documents a user is searching, which specific part of the application a user is searching, or even which user is searching. 2.2.3. Adding structure to unstructured content Endeca is a leader in extracting and exploiting structure from semi-structured or unstructured data.4 This occurs during its data transformation and indexing processes by a number of methods: • Entity extraction: Endeca automatically extracts entities – people, places, and organizations – found in unstructured documents based on a variety of natural language processing techniques and statistical inference. In addition, the extraction process is self-training. Once a new type of entity is extracted in a number of documents – for example, product names—Endeca subsequently automatically extracts product names as metadata during the indexing pro- cess. • Inherent metadata: Endeca can extract the metadata – data about documents such as their date of creation, file type, and file size—from more than 370 file types, including documents with no inherent structure such as Word and PDF files. This valuable information is then used by Endeca’s Guided Navigation and search features for informa- tion access and retrieval. This capability is particularly powerful in cases where documents have some consistent metadata – for example, in content management systems—and is critical for unstructured data. • Contextual metadata: Endeca can also extract and leverage existing information about records held in a file system. For example the file structure, including elements of the file path, can be parsed and added to the record as meta- data. A document containing information about a company’s next product release may be found using a file path such as “Product Management > 2005 Product Releases > Product Release 2.0.” This information can be used as for making search refinements through Endeca’s Guided Navigation capabilities. In cases where file structures are very hierarchical, this process can add several layers of metadata. 9 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 10. • Concept extraction: Endeca offers the ability to extract key concepts from unstructured data via existing or im- ported, pre-built thesauri involving hundreds of industry-standard taxonomies in dozens of subject domains and languages. These thesauri also expand queries to include related terms. • Rules-based tagging: Endeca can use rules to add still more tags to documents during its process of acquiring con- tent from original sources. Rules can be as simple as tagging all documents containing the text “MSFT” or “Micro- soft” with <Microsoft> or as sophisticated as employing Boolean syntax and developing a rule stating <if X AND Y> and <date=June03> add <TAG> for records from June 3 that include both X and Y. To facilitate implementing rules- based tagging, Endeca leverages industry-standard thesauri, taxonomies, and controlled vocabularies. 2.2.4. Targeted searching Users have a powerful but easy-to-use suite of functionality to hone the recall and relevance of their results: • Search within results: Users can refine their search process by launching iterative searches against their results. (They can also refine results with Guided Navigation.) • Parametric search: A parametric search interface gives users the option to simultaneously filter by ranges of information along multiple navigation dimensions. The parametric search options dynamically update as the user selects refinements, so that the user will never reach a dead-end. He or she will only have the ability to select a combination of refinements that lead to actual, relevant, results. • Dynamic concept discovery: Endeca offers users the ability to refine results by concept clusters. For example, a search for “eagles” will return thousands of relevant articles in an online publishing application. Endeca’s technol- ogy will then help users refine the results to get to the article they’re looking for by presenting clusters of articles relating to unique but relevant key concepts – for example, the sports team (Philadelphia Eagles), the band (Eagles), and the birds. • Automatic phrasing: Automatic phrasing: Endeca treats a series of words – for example, “Tom Cruise”—as a single phrase, improving the relevancy of results. For example, in this case it might be set to only return documents where “Tom” and “Cruise” are adjacent, greatly enhancing the precision of results. Endeca can also offer users the oppor- tunity to opt in or opt out of the phrasing. 2.3. Content Spotlighting Content Spotlighting is an out-of-the-box capability for highlighting specific, relevant content on-screen as well as gener- ally grouping or arranging search results – based on defined business rules. Frequently used in merchandising for cross- selling and up-selling, it can also be used to disclose popular or richer content related to a query or for targeted adver- tising. For example, if a user is searching for articles on the “Red Sox” in an online publishing application, the business owner could use Content Spotlighting to highlight premium content that is only available on their site like live highlight videos, player statistics, or articles from featured sports columnists. If a user is searching for a high-paying nursing job in the Buckhead area of Atlanta on an online job site, the business owner can use Content Spotlighting to offer its advertisers (hospitals) the opportunity to buy highly targeted advertising inventory (for example, on web pages with content on nurs- ing, high salary range, Buckhead) instead of just the category “Nursing”. Integrated with search and Guided Navigation, Content Spotlighting is data-driven, interactively responding to users’ search activity – as specified by the business rules. It can be triggered by search terms or Guided Navigation choices. It can also be triggered by user profile information. During a query, rules are dynamically selected to provide users with the most relevant content possible – i.e. content related to both what they are looking for and to the user’s profile (for ex- ample, demographics, click behavior, etc.). This capability represents an advanced feature that other search technologies can’t provide dynamically and at scale. As a result of these features, Content Spotlighting significantly helps users find what they are looking for and, more im- portant, frequently enables them to discover information and content that they didn’t know existed. It also enables com- panies to promote premium or featured content and highly relevant and targeted advertisements. In this way, it boosts search effectiveness and efficiency and creates a very compelling user experience. Business owners can use Content Spotlighting to highlight the premium content that’s available on their site (and only their site) and help users see the value in the paid subscription or registration. This contributes to greater customer satisfaction and loyalty and creates site “stickiness” and repeat usage. 10 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 11. World Book makes it possible to search content of all types, supplementing articles in multiple languages with rich media include videos, audio clips, photos, and structured tables -- from multiple content repositories. Content Spotlighting is also easy to implement and manage – even for complex content collections. Business users – with- out IT help – can easily define the rules that drive Content Spotlighting placements using an intuitive, web-based Endeca interface designed specifically for their needs, versus the needs of the IT department. Once the rules are implemented, they are updated dynamically, and changing the parameters is easy. As a result, the need to use costly IT resources for these tasks is eliminated, and business managers spend less time managing the placements – decreasing costs overall. 2.4. Additional Platform Features In addition to supplying users with unique technology that promotes search success and increased site activity, the Endeca platform is designed for ease of implementation and maintenance, lowering the burden on IT resources and providing companies with a successful information and retrieval solution with a low total cost of ownership. 2.4.1. Single interface to multiple data sources As mentioned, content often originates in separate data stores or includes various document formats and structured data schemas. The Endeca Information Access Platform crosses these boundaries to give users a seamless and single access point to all data, regardless of its origin. A search might transparently cross, for example, image files, XML files, and PDFs because Endeca supports: • Multiple formats: Endeca can search the most popular document formats including PDFs, Word docs, HTML, and many more (over 350 different file types). Likewise, structured data might originate in an RDBMS, XML database, or many other sources. • Multiple data sources: Data can originate in separate silos, and users can search all sources from a single inter- face. • Permissions: Individuals and groups can gain access to subsets of data based on their login ID. Guided Navigation options always perfectly reflect only the valid choices available to a specific user, giving everyone a customized view. 11 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 12. 2.4.2. Open architecture The Endeca Information Access Platform extracts and integrates data from multiple disparate sources including relational databases, file servers, web sources (XML files), and content management systems and other packaged applications. It in- tegrates with diverse sources systems via packaged adapters and APIs to transfer data by a range of approaches, includ- ing data extracts; adapters, web crawlers, file server crawlers, and its own SDK – the Endeca Content Acquisition Develop- ers’ Kit for building custom adapters. 2.4.3. A high-performing, low-cost infrastructure The Endeca Information Access Platform provides a powerful solution at a low total cost of ownership, based on the fol- lowing features and capabilities: • A standards-based architecture: Endeca integrates easily into the enterprise infrastructure. At the data level, Endeca has been designed to work with content of all kinds of systems and formats. It also integrates easily with other applications via a rich set of APIs. This flexibility makes the Endeca platform easy to deploy and allows com- panies to leverage their existing architecture. • Easy scaling: Because Endeca is built on a distributed platform, it scales easily for both increasing data volumes and site traffic while maintaining fast search performance – just by adding inexpensive, commodity servers. • High performance: Endeca provides sub-second response times to queries because its meta-relational indexing makes highly aggressive use of memory, multi-threading, index compression techniques, and cache engineering. This speed enhances the user experience, contributing to customer loyalty. 3. ENDECA ROI Because of its innovative technology, the Endeca Information Access Platform meets the challenges of finding the right information in complex content collections. Furthermore, media and publishing leaders have found that Endeca solutions are quick and easy to deploy and maintain, and are enthusiastically adopted by broad audiences of information seekers. As a result, they produce early and continuing ROI in several areas. 3.1. Improved Customer Retention and Acquisition The Endeca Information Access platform offers users a powerful, intuitive user experience that highlights premium content and differentiates the site from other commodity content sites – promoting customer satisfaction and, ultimately, customer retention and acquisition. Endeca’s fast and easy indexing gets content on-site quickly and cost-effectively – en- suring media and publishing companies have rich, up-to-date, content that their competitors lack. Because of Endeca’s powerful Guided Navigation, advanced search, and Content Spotlighting capabilities, customers can easily find the premium content they seek and can even discover previously unknown but relevant information. Features like Endeca’s intuitive interface, configurable relevancy ranking, and scalability also enhance the customer experience and ensure a search proceeds to the right result quickly and easily. For example, Endeca enabled World Book to increase the speed of its search eight-to-ten times over its previous technology while offering richer search results (i.e., images, maps, etc.) relating to the subject being researched. In addition, Endeca’s reporting tools provide sites with information on usage and trends, like popular search terms, docu- ments, or images. This information allows site developers to fine-tune features like relevancy ranking, thesauri, and Con- tent Spotlighting to further enhance search success and direct customers to desirable and relevant content. As a result of this positive search experience, customers spend more time exploring the site and finding even more rel- evant information. They also return to the site with increasing loyalty – and create a positive buzz. This word-of-mouth, in turn, results in growing brand recognition and easier customer acquisition. Customer results tell the story best. With Endeca solutions: • Calls to customer support at Nando Media (a McClatchy Company) dropped by 15-20% because customers found what they wanted by themselves. 12 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 13. • World Book increased speed of search by 8-10x while providing richer search results. • 78% of users of a leading classifieds directory preferred the new site over the old experience; 72% of those users pointed to Guided Navigation & the Endeca breadcrumb as the driver of loyalty. 3.2. Increased Revenues Just as Endeca’s superior user experience helps to improve customer retention and acquisition; it also leads to higher revenues. For example, after a site upgrade featuring a new Endeca solution, World Book increased its sales by 20%. With Endeca, revenue benefits can occur in several other areas. Most relevant to media and publishing companies, En- deca can help increase site activity, increase advertising revenues, and increase subscriptions and registrations. 3.2.1. Transaction revenues. As companies get more of their premium information online easily and cost-effectively with Endeca’s indexing, and more customers find what they are looking for via advanced search and Guided Navigation, conversion rates rise, leading to increased revenues. Satisfied customers return to the site to look for more research reports or case studies, for example, and the number of purchases per unique visitor increases. As a result, Endeca customers have seen margins and overall transactional revenues grow significantly. 3.2.2. Advertising revenues. As the number of customers and page views increase, this improvement in site traffic and traffic quality directly impacts ad revenues – attracting advertisers to the site and creating additional advertising inventory available for advertisers to buy. In addition, with more pages accessed – especially with visitors accessing different pages and exploring the content more deeply so that more pages are visited -- there is more relevant and high-quality ad inventory to sell, and that ad inventory commands a higher price. Just as important, the increase in site visits (from repeat and new customers) also improves the likelihood of higher click-through rates and a larger number of ad impressions (CPM and CPC rates) – especially because Content Spotlight- ing allows sites to target ads to pre-qualified customers based on their search and navigation paths. The result is more revenue generated per page view and per advertiser. For example, a leading newspaper publisher in the UK saw a stun- ning increase of 20% in page views and 40% in click-through rates. 3.3.3. Subscription and registration revenues With all of the free content sites available today (for example, search engines, blogs, and content aggregators), it is dif- ficult to justify subscription fees or even free registrations to your potential customer base. The most important way (and, ironically, the easiest way!) to show the value of the subscription fee is by improving the search experience, so that users can find that premium content that’s available only on your site. If users can’t find the content that really makes up the value of the subscription fee, there’s no way they’ll pay for access, and they won’t even take the time to register to access the content. In 2005, InfoCommerce Group reported that companies with subscription-based services lose 15-20% of their subscriber base each year because they couldn’t find the information they were looking for, even though it actually did ex- ist. Additionally, 25% of paid registrants log into the service once, find that the experience is difficult and frustrating, and never log in again. Obviously that same 25% don’t renew their subscriptions.5 Endeca’s integrated search, Guided Naviga- tion, and Content Spotlighting capabilities give companies the ability to highlight valuable content and users the ability to find valuable content. As a result, several of Endeca’s customers have seen increases in subscriptions and registrations as their users quickly see the importance of their content versus the free content sites. 3.3.4. Licensing revenues. Once again, because Endeca easily enables users to navigate through content, find what they are looking for, and discover new content, site traffic increases. As a result, distributors and publishers are willing to pay higher licensing fees for ac- cess to premium data because they can see the value of the content and consistently find the specific piece of content they need to support their own businesses. For example, advertising agencies or news publishers are more likely to be willing to pay a higher licensing fee to a stock photography site if they have an easy and fast way to find and purchase the photos that they need for the print ad or article that will be released in tomorrow’s edition of the daily newspaper. 13 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.
  • 14. 3.3. Lower Total Cost of Ownership The easy-to-use, open technology of the Endeca Information Access Platform decreases total cost of ownership. While traditional search technologies with rigid schemas and taxonomies require intensive IT efforts to deploy and update and high-cost hardware to run queries, Endeca’s special approach to indexing and GUI-driven system tools allows for rapid system deployment and maintenance, including data cleansing and updates. Implementing and maintaining an Endeca solution is easier, less time-consuming, and, therefore, less costly – resulting in early ROI. For example, leading informa- tion provider IHS cut millions of dollars in IT labor costs over five years. Furthermore, Endeca solutions run on commodity hardware, reducing the hardware expenses of traditional search. They also scale economically as more data and users are added to the system – just by adding commodity servers. 4. CONCLUSION The Endeca Information Access Platform brings new information retrieval functionality – and significant financial and competitive benefits – to media and publishing companies. Built on innovative Endeca Guided Navigation® technology, it overcomes obstacles to retrieving complex information and exposes relevant content to users. With access to this information through an easy-to-use interface and an intuitive, productive approach to navigating infor- mation, customers find what they are looking for and discover other relevant content. This successful search and browse experience encourages them to explore the site, viewing more pages and often purchasing or downloading more informa- tion per visit, as well as to return to the site. As a result, revenues –from transactions, subscriptions and registrations, ads, and licensing—grow. And because Endeca technology is easy and cost-effective to use, deploy, and maintain, compa- nies lower their total cost of ownership. In other words, from its initial deployment and throughout its daily use, the Endeca Information Access Platform increases profits, lowers costs, and improves customer satisfaction – providing a competitive advantage. These advantages make it an economical—and critical – infrastructure application for media and publishing companies. 5. FOOTNOTES 1 Outlook, 2005 2 IDC, 2004 3 Research on this topic includes: • Nicholas J. Belkin. School of Communication, Information and Library Studies at Rutgers University. An overview of his work can be found at http://mariner.rutgers.edu/tipster /cladp97.html • Scott Card and Peter Pirolli. Information Foraging Theory. www2.parc.com/istl/projects/uir/pubs/ items/UIR-1999- 05-Pirolli-Report-InfoForaging.pdf • Jared Spool. User Interface Engineering Report. http://www.uie.com/articles/three_click_rule/ • Don Norman. The Design of Everyday Things, (Currency, 1990). 4 Forrester Research, “The Future of Enterprise Search,” 2003. 5 InfoCommerce 2005, The Conference for Data Publishers, November 6-8, 2005 14 © 2006 Endeca Technologies, Inc. All rights reserved. Endeca, Endeca Latitude, Endeca Navigation Engine, and Endeca Data Foundry are registered trademarks. Guided Navigation is a service mark of Endeca Technologies, Inc. All other product and service names mentioned herein are or may be registered trademarks or trademarks of their respective companies or organizations.