SlideShare uma empresa Scribd logo
1 de 7
• Cognizant 20-20 Insights




How to Develop Online Recommendation
Systems that Deliver Superior Business
Performance
   Executive Summary                                      combating this issue is what is known as a recom-
                                                          mendation system.
   Over the past two decades, the Internet has
   emerged as the mainstream medium for online            Many major e-commerce Websites are already
   shopping, social networking, e-mail and more.          using recommendation systems to provide
   Corporations also view the Web as a potential          relevant suggestions to their customers. The
   business accelerator. They see the huge volume of      recommendations could be based on various
   transactional and interaction data generated by        parameters, such as items popular on the
   the Internet as R&D that informs the creation of       company’s Website; user characteristics such as
   new and more competitive services and products.        geographical location or other demographic infor-
                                                          mation; or past buying behavior of top customers.
   Several “e-movement” crusaders have discovered
   that customers spend significant amounts of            This white paper presents an overview of how
   time researching products they seek before             we are helping a Fortune 500 organization to
   purchasing. In a bid to assist customers in these      implement a recommendation system. Moreover,
   efforts, and conserve precious time, these orga-       this paper also sheds light on key challenges that
   nizations offer users suggestions of products          may be encountered during implementation of a
   they may be interested in. This serves the dual        recommendation system built on the open source
   purpose of not just attracting browsers but            Apache Mahout,1 a large-data library of statistical
   converting them into buyers. For instance, an          and analytical algorithms.
   online bookstore may know that a customer has
   interest in mobile technology based on previous        Recommendation Systems
   site visits and suggest relevant titles to purchase.   Recommendation systems can be considered
   An uninitiated user may be impressed by such           as a valuable extension of traditional informa-
   suggestions. Suggestions (or “recommenda-              tion systems used in industries such as travel
   tions” as they are popularly known) predict likes      and hospitality. However, recommendation
   and dislikes of users. To offer meaningful rec-        systems have mathematical roots and are more
   ommendations to site visitors, these companies         akin to artificial intelligence (AI) than any other
   need to store huge amounts of data pertaining          IT discipline. A recommendation system learns
   to different user profiles and their correspond-       from a customer’s behavior and recommends a
   ing interests. This eventually culminates in infor-    product in which users may be interested. At the
   mation overload, or difficulty in understanding        heart of recommendation systems are machine-
   and making informed decisions. One solution to



   cognizant 20-20 insights | january 2012
learning constructs. Leading e-commerce players        Classification is a technique used to decide
use recommendation engines that sift users’ past       whether new input or a search term matches
purchase histories to recommend products such          a previously observed pattern. It is also used
as magazine articles, books, goods, etc. Here is       to detect suspicious network activity. Yahoo!
how major e-commerce companies use recom-              Mail8 uses classification to decide if an incoming
mendation engines to improve their sales and           message is spam. Image sharing sites like Picasa9
their customers’ shopping experience.                  use classification techniques to determine
                                                       whether photos contain human faces. They
•   Amazon.com: Depending on past purchases
                                                       then offer recommendations of people that are
    and user activity, the site recommends prod-
                                                       identified in the user contacts list.10
    ucts of user interest.2

•   Netflix: Recommends DVDs in which a user           A Robust System to Counter
    may be interested by category like drama,          Information Overload
    comedy, action, etc. Netflix went so far as to     We are working with a leading multinational man-
    offer a $1 million3 prize to researchers who       ufacturing company that has numerous product
    could improve its recommendation engine.           research labs with many scientists and research-
•   eBay: Collects user feedback about its prod-       ers in numerous countries working on different
    ucts which is then used to recommend prod-         technologies. To help facilitate scientific research,
    ucts to users who have exhibited similar be-       and to buy the latest technology information, this
    haviors.4                                          client partnered with information providers such
                                                       as Scopus, Knovel, etc. But despite these data
Online companies that leverage recommendation          sources, scientists and researchers were often
systems can increase sales by 8% to12%.5               unable to find the right information to improve
                                                       their research. Also, scientists across the globe
Companies that succeed with recommendation             were unable to collaborate and share technical
engines are those that can quickly and efficiently     information with each other. This situation is
turn vast amounts of data into actionable infor-       typical of companies dealing with information
mation.                                                overload.

Anatomy of a Recommendation Engine                     To increase the informational awareness of
The key component of a recommendation system           scientists and other employees, the client wanted
is data. This data may be garnered by a variety        to create a system to recommend resources like
of means such as customer ratings of products,         patents, articles and journals from paid content
feedback/reviews from purchasers, etc. This data       providers. A successful system needed to learn
will serve as the basis for recommendations to         from user searches and be intelligent enough to
users. After data collection, recommendation           recommend popular resources similar to the ones
systems use machine-learning algorithms to             that a user is currently working from. The system
find similarities and affinities between products      was also expected to provide scientists with
and users. Recommender logic programs are              useful insights on information other scientists
then used to build suggestions for specific user       across the globe are using. Finally, the system
profiles. This technique of filtering the input data   was to serve as a platform to connect scientists
and giving recommendations to users is also            working on similar technologies.
known as “collaborative filtering.”6
                                                       We helped to design and develop the system,
Along with collaborative filtering, recommenda-        which was dubbed “intelligent recommendation
tion systems also use other machine-learning           system” (IRS). Many of the problems the orga-
techniques such as clustering and classification       nization faced originated from the multiple pref-
of data. Clustering is a technique which is used to    erences and needs of users pertaining to their
bundle large amounts of data together into similar     individual research topics. To make the system
categories. It is also used to see data patterns and   adaptive to specific user requirements, the
render huge amounts of data simpler to manage.         solution proposed was to use a recommendation
For instance, Google News7 creates clusters of         system. As a first step towards the solution, the
similar news information when grouping diverse         large information base possessed by the client
arrays of news articles. Many other search             was categorized/grouped by specific criteria.
engines use clustering to group results for similar    After much contemplation of data and size of the
search terms.                                          user base, we decided to implement the system
                                                       using the Apache Mahout framework.


                        cognizant 20-20 insights       2
The Mahout framework is highly flexible and lets          highly scalable. Mahout is considered a superior
developers customize outcomes according to                way of building recommendation systems
their ad hoc requirements. We then developed              because it implements all the three machine
a customized algorithm to recommend relevant              learning techniques — collabora-
resources to scientists and researchers.                  tive filtering, clustering and clas- Mahout is considered
                                                          sification. Collaborative filtering a superior way
Building an Intelligent                                   is the primary technique used
Recommendation System                                     by Apache Mahout to provide
                                                                                               of building
The Solution                                              recommendations. Given rating recommendation
                                                          data along with a set of users systems because
The most important purpose of an intelligent
                                                          and items, collaborative filtering
recommendation system (IRS) is to increase
                                                          generates recommendations in
                                                                                               it implements all
awareness among scientists about the areas in
which they are exploring, technologies on which
                                                          one of the following four ways.      the three machine
their colleagues are working, and information                                                      learning techniques
                                                          •   User-based: Recommenda-
about experts and their views on that particular              tions are made based on us-          — collaborative
discipline. Despite the possession of huge
amounts of data, there was very little insight on
                                                              ers with similar characteris-        filtering, clustering
                                                              tics.
the information scientists were seeking. There                                                     and classification.
are fundamental differences designing a recom-            •   Item-based: Recommenda-
mendation system compared with traditional                    tions are based on similar items.
software design. The overall system architec-             •   Slope-one: A fast technique that offers rec-
ture depends heavily on the choice of algorithms              ommendations based on previous user-rated
and the system architecture employed. By using                items.
Apache Mahout, the team selected a conven-
tional open source framework that implements              •   Model-based: This approach compares the
                                                              profile of an active user to aggregate user
machine-learning algorithms.
                                                              clusters, rather than the concrete profiles.
Apache Mahout is a new Apache Software                    There are many algorithms that are used to
Foundation (ASF) project whose primary goal is            calculate similarities between two entities. The
to create scalable machine-learning algorithms            choice of algorithm plays a vital role in deciding
that are free to use under the Apache license.            the quality of the recommendation that is most
The term “Mahout” is derived from the Hindi               suited for a given scenario. Since the IRS for
word that means elephant driver. The Mahout               this client needed to offer recommendations to
project started in 2008 as a subproject of                scientists based on their multiple preferences,
Apache’s Lucene project, which is a popular               we adopted the user-based collaborative filtering
search engine. Given the amount of data that the          (see Figure 1).
client possessed, it was imperative that the IRS be


Recommendation System Execution Flow


                                             Data Model

                                                                      Neighborhoods



                                                  Intelligent                         Similarity
                                 User          Recommendation                         Alogrithms
                                                    System

               Recommendations
                                           Evaluator               Recommender
                                            Program                Logic Program



Figure 1



                          cognizant 20-20 insights        3
Providing Recommendations                               daily to create the rating data for new resources.
                 Creating an Input                                       Finding Similarity Between Users/Items
              Within the process of generating recommenda-               After creating a data model, the next step in
              tions, a crucial step is for the IRS to generate           building an IRS is finding similarities and patterns
              rating and usage data. Such data could be applied          between users and items. There are many
              to which articles are popular and which ones had           algorithms which can be used to find item simi-
              the most read counts. To collect and consolidate           larities: GenericUserSimilarity, TanimotoCoef-
                                   this information, the IRS we          ficientSimilarity,    PearsonCorrelationSimilarity,
            Our IRS used used applied the Boolean                        SpearsonCorrelationSimilarity and EuclideanDis-
                                   data model. This technique
           a customized tracks which article/product                     tanceSimilarity, to name a few. By far the most
                                                                         common algorithm is TanimotoCoefficientSimi-
   recommender logic was visited by each particular                      larity as it takes all types of input data, such as
          program which user. This strategy is widely                    binary or numbers. TanimatoCoefficient provides
                                   used in “social bookmarking”
analyses and matches sites. In the IRS developed                         similarity based on the ratings of each item. In the
                                                                         IRS we were building, due to the lack of rating data
      the ratings of one for this client, articles were                  for all items, our team decided to customize the
        user with similar rated between a range of 1 to                  similarity algorithm based on user demographic
                                   2, depending on the number
        user ratings and of times an article was visited                 properties. The demographic properties such as
                                                                         country, division, job title, department, etc. were
recommends articles/ or read. A decimal number in                        used to establish similarity among users. The
    items which match the range closer to 1 signifies                    output of this algorithm is pivotal as it is the base
                                   fewer hits while numbers
       the user interest. closer to 2 signify higher                     on top of which recommendations are built.

                                   usage. This rated data is then        Generating Recommendations
              grouped into one of the following data models. In          Once similarity is established, special recom-
              this context, data model refers to the abstraction         mender logic programs are used to recommend
              layer which encapsulates product data and cor-             items to users. This is very similar to user prefer-
              responding user ratings.                                   ences. Our IRS used a customized recommender
                                                                         logic program which analyses and matches the
                 The following are the data models that can be
                                                                         ratings of one user with similar user ratings and
                 used to create an IRS depending on input data.
                                                                         recommends articles/items which match the user
                                                                         interest. The output of this recommender logic
                 •   In-memory data model: Data is prepared
                     using programs which make objects of the data       program are the actual recommendations for
                     with product properties like product id, userid     the user. In the case of multiple domains such as
                     and ratings.                                        fields of science (ceramics, nanotechnology, etc.),
                                                                         recommendations might not be accurate since
                 •   File-based data: Data is provided in a comma-
                                                                         user interests and expertise vary significantly.
                     separated file along with a preference file.
                                                                         For such scenarios, recommendations should
                     Implementation of a data refresh module will
                                                                         be evaluated with user domain or user demo-
                     be required to reload data into the comma-
                                                                         graphics. Programs which aid in cross-verifica-
                     separated file.
                                                                         tion of the results of recommendation engines
                 •   Database-based data: Relational databases           are known as “evaluators.” Our IRS validates
                     will be a good choice if the data is in the order   recommended items with different relevance
                     of gigabytes. However, the implication is that      parameters such as scientist domain, technology
                     a recommendation engine based on this model         and scientist interest. The output of this program
                     will be much slower than in-memory data rep-        proved beneficial in offering concrete recommen-
                     resentations.                                       dations to scientists and researchers and, hence,
                                                                         conserved time.
                 For our client, an IRS was created using a file-based
                 data model. This approach delivers superior per-        Key Challenges
                 formance and execution time compared with               The problems faced during development were
                 a database-based model. Also, to sustain the            compounded by the complexity of the algorithms
                 freshness of the file and thereby return newer          used, as well as the common problems of
                 articles and resources with reduced latency, a          software design. Apart from these developments,
                 scheduled refresh module was used which runs            issues unique to AI applications, such as the need


                                         cognizant 20-20 insights        4
for good quality input data, also arose. Common                       The next challenge is providing recommenda-
algorithms such as Tanimoto and Pearson                               tions to a user who is not part of a clear group;
similarity have several advantages, such as the                       these users can be called “grey sheep.” The
ability to detect the quality and features of an                      challenge of giving recommendations to new or
item at the time of recommendation, especially                        unknown users is called the
when a user rating on a scale of 1 to 5 is available.                 “grey sheep problem.” This Data clustering requires
Another advantage is that these algorithms are                        problem mostly arises in
useful in domains where analyzing data is difficult                   Internet applications with
                                                                                                      multiple machines to
or costly, such as gathering data about science                       large and diverse customer handle massive data
articles where domain knowledge is required to                        bases that access the IRS sets and stores all
rate an article.                                                      application. A solution to this
                                                                      problem is to look for similar
                                                                                                      resources that are
The first challenge in the development of our                         users based on similar demo- required to provide
IRS is that the system was brand new and lacked                       graphic information or char- recommendations.
user rating information. Huge amount of input                         acteristics. These users can
data, but few users and no ratings is referred                        be technically called “neighbors.” To determine
to as “Cold Start Problem.” A solution to this                        similarity, a “classification” machine-learning
problem was to use the TanimotoCoefficient11                          mechanism can be used. Similar neighbors can
algorithm and binary dataset to feed the system                       be found by implementing different similarity
with the much-sought-after data during the                            algorithms or by checking similar user properties.
initial use of the system. In the course of time,                     The probability of this problem is significantly
user rating data will be obtained after which the                     lower in intranet applications. The classification
similarity algorithm and input data structure can                     algorithm trains the computer to classify similar
be changed. This initial dataset can be prepared                      users in a particular group and teaches the IRS
by collecting data such as maximum hits of a                          to find the specific group when a user is new to
particular product or tracking visits of articles.                    the system.
If a binary dataset is adopted then most suitable
algorithms are TanimotoCoefficientSimilarity or                       Mahout currently supports two types of classifier
LogLikelihoodSimilarity12 algorithms, since these                     approaches: Naive Bayes classifier and Comple-
are specially defined for binary data and use the                     mentary Naive Bayes classifier. Naive’s classifier
Jaccard13 coefficient formula. As the usage of the                    is the faster method for the problems such as
recommendation system grows, huge amounts of                          grey sheep recommendations. This classifier
data will be collected. It is advisable at this point                 includes two parts. First, it analyzes the existing
to use clustering mechanisms to sustain per-                          documents or data and finds the common
formance. A solution to deal with massive data                        features among them. This method is also called
by means of data clustering is Hadoop.14 Data                         as preparing training examples. Secondly, clas-
clustering requires multiple machines to handle                       sification is performed using the model that
massive data sets and stores all resources that                       was created in the first step using the Bayes
are required to provide recommendations.                              theorem15 to predict the category of a new item
                                                                      (see Figure 2).


Diagram of Classification Technique


                      Input                         Training                            Training
                                                                      Predictor        Algorithms
                 Documents/Items                   Examples           Variable          e.g. Bayes

                                            Based on Demographic Use User’s Property
                                               Properties of User    to Classify


                      New                                                                                    Predicted
                    Items/            Predictor                      Model              Estimated            Category/
                  Documents           Variable                                           Variable            Decision
                                                                Similar Users are
                                   Detect Criteria to             Classi ed in          Predicted Property
                                   Classify the Users            Speci c Group          on which User Can
                                                                                           Be Classi ed




Figure 2



                        cognizant 20-20 insights                       5
Finally, sometimes providing recommendations to             can be an important means of bolstering sales
users only on product ratings will not be accurate;         and increasing information awareness among
problems may arise when the recommended                     customers.
items originate from another domain to which
the user is not related. This problem is known as           In this paper, we examined how recommendation
a “blind recommendation” problem. This problem              systems can be applied to the scientific research
is mostly prevalent in the post-production phase            domain. The successful implementation of an
of recommendation systems development. A                    IRS provided our client with flexibility in using
solution to this problem is to use evaluators that          pre-existing information resources. It solved the
continuously verify and validate recommenda-                problem of information overload by enabling
tions with actual user details.                             scientists and researchers to use information
                                                            resources more efficiently for their research.
Conclusion                                                  Our IRS offers testimony that a recommendation
Online businesses are working very hard to                  system that provides customizable recommen-
increase their customer bases by providing                  dations can enable online companies to perform
competitive prices, products and services. In               business more effectively.
this frantic effort, recommendation systems



Footnotes
1
     http://mahout.apache.org/
2
     https://cwiki.apache.org/MAHOUT/mahout-on-elastic-mapreduce.html as of August 2011
3
     http://www.netflixprize.com/
4
     http://developer.ebay.com/DevZone/xml/docs/Reference/ebay/AddSellingManagerTemplate.html
     as of August 2011
5
     http://www.practicalecommerce.com/articles/1942-10-Questions-on-Product-Recommendations
     as of August 2011
6
     http://en.wikipedia.org/wiki/Collaborative_filtering
7
     http://www2007.org/papers/paper570.pdf
8
     http://www.manning.com/owen/Mahout_MEAP_CH01.pdf as of August 2011
9
     http://blogoscoped.com/archive/2007-03-12-n67.html
10
     http://news.cnet.com/8301-27076_3-10358662-248.html
11
     http://en.wikipedia.org/wiki/Jaccard_index
12
     http://en.wikipedia.org/wiki/Likelihood-ratio_test
13
     http://en.wikipedia.org/wiki/Jaccard_index
14
     http://hadoop.apache.org/
15
     http://en.wikipedia.org/wiki/Naive_Bayes_classifier


References:
Mahout: http://www.ibm.com/developerworks/opensource/library/j-mahout/index.html
http://ilk.uvt.nl/~toine/phd-thesis/phd-thesis.pdf
GroupLens Research Project: http://www.grouplens.org/papers/pdf/ec-99.pdf
http://en.wikipedia.org/wiki/Recommender_system




                          cognizant 20-20 insights          6
About the Authors
Anup Prakash Warade is a Cognizant Associate. He develops and supports multiple projects for key
clients in the manufacturing and logistics industry. His interests include designing and developing Java
EE Web applications. Arup holds a bachelor’s of engineering degree in electronics from the University of
Pune in India. He can be reached at Anup.Warade@cognizant.com.

Vignesh Murali Natarajan is a Cognizant Senior Associate. He is a technical trainer and a Java architect
who is currently managing several Java EE applications for a manufacturing client. Vignesh holds a
bachelor’s degree in computer science and engineering from the University of Madras. He can be
reached at Vigneshmurali.Natarajan@cognizant.com.

Siddharth Sharad Chandak is a Cognizant Project Manager overseeing numerous business-critical
projects for clients in the manufacturing and logistics industry. His interests include architecting and
designing enterprise software applications, software performance tuning and project management.
Siddharth holds a master’s of science degree in computer science from Kansas State University. He can
be reached at Siddharth.Chandak@cognizant.com.




About Cognizant
Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out-
sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in
Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry
and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50
delivery centers worldwide and approximately 130,000 employees as of September 30, 2011, Cognizant is a member of
the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing
and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.



                                         World Headquarters                  European Headquarters                 India Operations Headquarters
                                         500 Frank W. Burr Blvd.             1 Kingdom Street                      #5/535, Old Mahabalipuram Road
                                         Teaneck, NJ 07666 USA               Paddington Central                    Okkiyam Pettai, Thoraipakkam
                                         Phone: +1 201 801 0233              London W2 6BD                         Chennai, 600 096 India
                                         Fax: +1 201 801 0243                Phone: +44 (0) 20 7297 7600           Phone: +91 (0) 44 4209 6000
                                         Toll Free: +1 888 937 3277          Fax: +44 (0) 20 7121 0102             Fax: +91 (0) 44 4209 6060
                                         Email: inquiry@cognizant.com        Email: infouk@cognizant.com           Email: inquiryindia@cognizant.com


© Copyright 2012, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any
means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is
subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.

Mais conteúdo relacionado

Mais de Cognizant

The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...Cognizant
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Cognizant
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Cognizant
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityCognizant
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersCognizant
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalCognizant
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueCognizant
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachCognizant
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudCognizant
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedCognizant
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the FutureCognizant
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformCognizant
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...Cognizant
 
The Timeline of Next
The Timeline of NextThe Timeline of Next
The Timeline of NextCognizant
 
Realising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value ChainRealising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value ChainCognizant
 
The Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional ChessboardThe Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional ChessboardCognizant
 
Use AI to Build Member Loyalty as Medicare Eligibility Dates Draw Near
Use AI to Build Member Loyalty as Medicare Eligibility Dates Draw NearUse AI to Build Member Loyalty as Medicare Eligibility Dates Draw Near
Use AI to Build Member Loyalty as Medicare Eligibility Dates Draw NearCognizant
 
The Work Ahead in Banking & Financial Services: The Digital Road to Financial...
The Work Ahead in Banking & Financial Services: The Digital Road to Financial...The Work Ahead in Banking & Financial Services: The Digital Road to Financial...
The Work Ahead in Banking & Financial Services: The Digital Road to Financial...Cognizant
 
The Work Ahead in Insurance: Vying for Digital Supremacy
The Work Ahead in Insurance: Vying for Digital SupremacyThe Work Ahead in Insurance: Vying for Digital Supremacy
The Work Ahead in Insurance: Vying for Digital SupremacyCognizant
 
The Work Ahead in Healthcare: Digital Delivers at the Frontlines of Care
The Work Ahead in Healthcare: Digital Delivers at the Frontlines of CareThe Work Ahead in Healthcare: Digital Delivers at the Frontlines of Care
The Work Ahead in Healthcare: Digital Delivers at the Frontlines of CareCognizant
 

Mais de Cognizant (20)

The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
The Work Ahead in Higher Education: Repaving the Road for the Employees of To...
 
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...Engineering the Next-Gen Digital Claims Organisation for Australian General I...
Engineering the Next-Gen Digital Claims Organisation for Australian General I...
 
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
Profitability in the Direct-to-Consumer Marketplace: A Playbook for Media and...
 
Green Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for SustainabilityGreen Rush: The Economic Imperative for Sustainability
Green Rush: The Economic Imperative for Sustainability
 
Policy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for InsurersPolicy Administration Modernization: Four Paths for Insurers
Policy Administration Modernization: Four Paths for Insurers
 
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with DigitalThe Work Ahead in Utilities: Powering a Sustainable Future with Digital
The Work Ahead in Utilities: Powering a Sustainable Future with Digital
 
AI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to ValueAI in Media & Entertainment: Starting the Journey to Value
AI in Media & Entertainment: Starting the Journey to Value
 
Operations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First ApproachOperations Workforce Management: A Data-Informed, Digital-First Approach
Operations Workforce Management: A Data-Informed, Digital-First Approach
 
Five Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the CloudFive Priorities for Quality Engineering When Taking Banking to the Cloud
Five Priorities for Quality Engineering When Taking Banking to the Cloud
 
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining FocusedGetting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
Getting Ahead With AI: How APAC Companies Replicate Success by Remaining Focused
 
Crafting the Utility of the Future
Crafting the Utility of the FutureCrafting the Utility of the Future
Crafting the Utility of the Future
 
Utilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data PlatformUtilities Can Ramp Up CX with a Customer Data Platform
Utilities Can Ramp Up CX with a Customer Data Platform
 
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
The Work Ahead in Intelligent Automation: Coping with Complexity in a Post-Pa...
 
The Timeline of Next
The Timeline of NextThe Timeline of Next
The Timeline of Next
 
Realising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value ChainRealising Digital’s Full Potential in the Value Chain
Realising Digital’s Full Potential in the Value Chain
 
The Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional ChessboardThe Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
The Work Ahead in M&E: Scaling a Three-Dimensional Chessboard
 
Use AI to Build Member Loyalty as Medicare Eligibility Dates Draw Near
Use AI to Build Member Loyalty as Medicare Eligibility Dates Draw NearUse AI to Build Member Loyalty as Medicare Eligibility Dates Draw Near
Use AI to Build Member Loyalty as Medicare Eligibility Dates Draw Near
 
The Work Ahead in Banking & Financial Services: The Digital Road to Financial...
The Work Ahead in Banking & Financial Services: The Digital Road to Financial...The Work Ahead in Banking & Financial Services: The Digital Road to Financial...
The Work Ahead in Banking & Financial Services: The Digital Road to Financial...
 
The Work Ahead in Insurance: Vying for Digital Supremacy
The Work Ahead in Insurance: Vying for Digital SupremacyThe Work Ahead in Insurance: Vying for Digital Supremacy
The Work Ahead in Insurance: Vying for Digital Supremacy
 
The Work Ahead in Healthcare: Digital Delivers at the Frontlines of Care
The Work Ahead in Healthcare: Digital Delivers at the Frontlines of CareThe Work Ahead in Healthcare: Digital Delivers at the Frontlines of Care
The Work Ahead in Healthcare: Digital Delivers at the Frontlines of Care
 

Último

Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Dave Litwiller
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Serviceritikaroy0888
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Anamikakaur10
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...amitlee9823
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...daisycvs
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageMatteo Carbone
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...lizamodels9
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...Aggregage
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsP&CO
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityEric T. Tung
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesDipal Arora
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentationuneakwhite
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Neil Kimberley
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876dlhescort
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Centuryrwgiffor
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...allensay1
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...lizamodels9
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture conceptP&CO
 

Último (20)

Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
Enhancing and Restoring Safety & Quality Cultures - Dave Litwiller - May 2024...
 
Call Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine ServiceCall Girls In Panjim North Goa 9971646499 Genuine Service
Call Girls In Panjim North Goa 9971646499 Genuine Service
 
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
Call Now ☎️🔝 9332606886🔝 Call Girls ❤ Service In Bhilwara Female Escorts Serv...
 
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
Call Girls Electronic City Just Call 👗 7737669865 👗 Top Class Call Girl Servi...
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
Insurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usageInsurers' journeys to build a mastery in the IoT usage
Insurers' journeys to build a mastery in the IoT usage
 
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
Call Girls From Pari Chowk Greater Noida ❤️8448577510 ⊹Best Escorts Service I...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Value Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and painsValue Proposition canvas- Customer needs and pains
Value Proposition canvas- Customer needs and pains
 
How to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League CityHow to Get Started in Social Media for Art League City
How to Get Started in Social Media for Art League City
 
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best ServicesMysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
Mysore Call Girls 8617370543 WhatsApp Number 24x7 Best Services
 
Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023Mondelez State of Snacking and Future Trends 2023
Mondelez State of Snacking and Future Trends 2023
 
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
Call Girls in Delhi, Escort Service Available 24x7 in Delhi 959961-/-3876
 
Famous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st CenturyFamous Olympic Siblings from the 21st Century
Famous Olympic Siblings from the 21st Century
 
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
Call Girls Service In Old Town Dubai ((0551707352)) Old Town Dubai Call Girl ...
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 

How to Develop Online Recommendation Systems that Deliver Superior Business Performance

  • 1. • Cognizant 20-20 Insights How to Develop Online Recommendation Systems that Deliver Superior Business Performance Executive Summary combating this issue is what is known as a recom- mendation system. Over the past two decades, the Internet has emerged as the mainstream medium for online Many major e-commerce Websites are already shopping, social networking, e-mail and more. using recommendation systems to provide Corporations also view the Web as a potential relevant suggestions to their customers. The business accelerator. They see the huge volume of recommendations could be based on various transactional and interaction data generated by parameters, such as items popular on the the Internet as R&D that informs the creation of company’s Website; user characteristics such as new and more competitive services and products. geographical location or other demographic infor- mation; or past buying behavior of top customers. Several “e-movement” crusaders have discovered that customers spend significant amounts of This white paper presents an overview of how time researching products they seek before we are helping a Fortune 500 organization to purchasing. In a bid to assist customers in these implement a recommendation system. Moreover, efforts, and conserve precious time, these orga- this paper also sheds light on key challenges that nizations offer users suggestions of products may be encountered during implementation of a they may be interested in. This serves the dual recommendation system built on the open source purpose of not just attracting browsers but Apache Mahout,1 a large-data library of statistical converting them into buyers. For instance, an and analytical algorithms. online bookstore may know that a customer has interest in mobile technology based on previous Recommendation Systems site visits and suggest relevant titles to purchase. Recommendation systems can be considered An uninitiated user may be impressed by such as a valuable extension of traditional informa- suggestions. Suggestions (or “recommenda- tion systems used in industries such as travel tions” as they are popularly known) predict likes and hospitality. However, recommendation and dislikes of users. To offer meaningful rec- systems have mathematical roots and are more ommendations to site visitors, these companies akin to artificial intelligence (AI) than any other need to store huge amounts of data pertaining IT discipline. A recommendation system learns to different user profiles and their correspond- from a customer’s behavior and recommends a ing interests. This eventually culminates in infor- product in which users may be interested. At the mation overload, or difficulty in understanding heart of recommendation systems are machine- and making informed decisions. One solution to cognizant 20-20 insights | january 2012
  • 2. learning constructs. Leading e-commerce players Classification is a technique used to decide use recommendation engines that sift users’ past whether new input or a search term matches purchase histories to recommend products such a previously observed pattern. It is also used as magazine articles, books, goods, etc. Here is to detect suspicious network activity. Yahoo! how major e-commerce companies use recom- Mail8 uses classification to decide if an incoming mendation engines to improve their sales and message is spam. Image sharing sites like Picasa9 their customers’ shopping experience. use classification techniques to determine whether photos contain human faces. They • Amazon.com: Depending on past purchases then offer recommendations of people that are and user activity, the site recommends prod- identified in the user contacts list.10 ucts of user interest.2 • Netflix: Recommends DVDs in which a user A Robust System to Counter may be interested by category like drama, Information Overload comedy, action, etc. Netflix went so far as to We are working with a leading multinational man- offer a $1 million3 prize to researchers who ufacturing company that has numerous product could improve its recommendation engine. research labs with many scientists and research- • eBay: Collects user feedback about its prod- ers in numerous countries working on different ucts which is then used to recommend prod- technologies. To help facilitate scientific research, ucts to users who have exhibited similar be- and to buy the latest technology information, this haviors.4 client partnered with information providers such as Scopus, Knovel, etc. But despite these data Online companies that leverage recommendation sources, scientists and researchers were often systems can increase sales by 8% to12%.5 unable to find the right information to improve their research. Also, scientists across the globe Companies that succeed with recommendation were unable to collaborate and share technical engines are those that can quickly and efficiently information with each other. This situation is turn vast amounts of data into actionable infor- typical of companies dealing with information mation. overload. Anatomy of a Recommendation Engine To increase the informational awareness of The key component of a recommendation system scientists and other employees, the client wanted is data. This data may be garnered by a variety to create a system to recommend resources like of means such as customer ratings of products, patents, articles and journals from paid content feedback/reviews from purchasers, etc. This data providers. A successful system needed to learn will serve as the basis for recommendations to from user searches and be intelligent enough to users. After data collection, recommendation recommend popular resources similar to the ones systems use machine-learning algorithms to that a user is currently working from. The system find similarities and affinities between products was also expected to provide scientists with and users. Recommender logic programs are useful insights on information other scientists then used to build suggestions for specific user across the globe are using. Finally, the system profiles. This technique of filtering the input data was to serve as a platform to connect scientists and giving recommendations to users is also working on similar technologies. known as “collaborative filtering.”6 We helped to design and develop the system, Along with collaborative filtering, recommenda- which was dubbed “intelligent recommendation tion systems also use other machine-learning system” (IRS). Many of the problems the orga- techniques such as clustering and classification nization faced originated from the multiple pref- of data. Clustering is a technique which is used to erences and needs of users pertaining to their bundle large amounts of data together into similar individual research topics. To make the system categories. It is also used to see data patterns and adaptive to specific user requirements, the render huge amounts of data simpler to manage. solution proposed was to use a recommendation For instance, Google News7 creates clusters of system. As a first step towards the solution, the similar news information when grouping diverse large information base possessed by the client arrays of news articles. Many other search was categorized/grouped by specific criteria. engines use clustering to group results for similar After much contemplation of data and size of the search terms. user base, we decided to implement the system using the Apache Mahout framework. cognizant 20-20 insights 2
  • 3. The Mahout framework is highly flexible and lets highly scalable. Mahout is considered a superior developers customize outcomes according to way of building recommendation systems their ad hoc requirements. We then developed because it implements all the three machine a customized algorithm to recommend relevant learning techniques — collabora- resources to scientists and researchers. tive filtering, clustering and clas- Mahout is considered sification. Collaborative filtering a superior way Building an Intelligent is the primary technique used Recommendation System by Apache Mahout to provide of building The Solution recommendations. Given rating recommendation data along with a set of users systems because The most important purpose of an intelligent and items, collaborative filtering recommendation system (IRS) is to increase generates recommendations in it implements all awareness among scientists about the areas in which they are exploring, technologies on which one of the following four ways. the three machine their colleagues are working, and information learning techniques • User-based: Recommenda- about experts and their views on that particular tions are made based on us- — collaborative discipline. Despite the possession of huge amounts of data, there was very little insight on ers with similar characteris- filtering, clustering tics. the information scientists were seeking. There and classification. are fundamental differences designing a recom- • Item-based: Recommenda- mendation system compared with traditional tions are based on similar items. software design. The overall system architec- • Slope-one: A fast technique that offers rec- ture depends heavily on the choice of algorithms ommendations based on previous user-rated and the system architecture employed. By using items. Apache Mahout, the team selected a conven- tional open source framework that implements • Model-based: This approach compares the profile of an active user to aggregate user machine-learning algorithms. clusters, rather than the concrete profiles. Apache Mahout is a new Apache Software There are many algorithms that are used to Foundation (ASF) project whose primary goal is calculate similarities between two entities. The to create scalable machine-learning algorithms choice of algorithm plays a vital role in deciding that are free to use under the Apache license. the quality of the recommendation that is most The term “Mahout” is derived from the Hindi suited for a given scenario. Since the IRS for word that means elephant driver. The Mahout this client needed to offer recommendations to project started in 2008 as a subproject of scientists based on their multiple preferences, Apache’s Lucene project, which is a popular we adopted the user-based collaborative filtering search engine. Given the amount of data that the (see Figure 1). client possessed, it was imperative that the IRS be Recommendation System Execution Flow Data Model Neighborhoods Intelligent Similarity User Recommendation Alogrithms System Recommendations Evaluator Recommender Program Logic Program Figure 1 cognizant 20-20 insights 3
  • 4. Providing Recommendations daily to create the rating data for new resources. Creating an Input Finding Similarity Between Users/Items Within the process of generating recommenda- After creating a data model, the next step in tions, a crucial step is for the IRS to generate building an IRS is finding similarities and patterns rating and usage data. Such data could be applied between users and items. There are many to which articles are popular and which ones had algorithms which can be used to find item simi- the most read counts. To collect and consolidate larities: GenericUserSimilarity, TanimotoCoef- this information, the IRS we ficientSimilarity, PearsonCorrelationSimilarity, Our IRS used used applied the Boolean SpearsonCorrelationSimilarity and EuclideanDis- data model. This technique a customized tracks which article/product tanceSimilarity, to name a few. By far the most common algorithm is TanimotoCoefficientSimi- recommender logic was visited by each particular larity as it takes all types of input data, such as program which user. This strategy is widely binary or numbers. TanimatoCoefficient provides used in “social bookmarking” analyses and matches sites. In the IRS developed similarity based on the ratings of each item. In the IRS we were building, due to the lack of rating data the ratings of one for this client, articles were for all items, our team decided to customize the user with similar rated between a range of 1 to similarity algorithm based on user demographic 2, depending on the number user ratings and of times an article was visited properties. The demographic properties such as country, division, job title, department, etc. were recommends articles/ or read. A decimal number in used to establish similarity among users. The items which match the range closer to 1 signifies output of this algorithm is pivotal as it is the base fewer hits while numbers the user interest. closer to 2 signify higher on top of which recommendations are built. usage. This rated data is then Generating Recommendations grouped into one of the following data models. In Once similarity is established, special recom- this context, data model refers to the abstraction mender logic programs are used to recommend layer which encapsulates product data and cor- items to users. This is very similar to user prefer- responding user ratings. ences. Our IRS used a customized recommender logic program which analyses and matches the The following are the data models that can be ratings of one user with similar user ratings and used to create an IRS depending on input data. recommends articles/items which match the user interest. The output of this recommender logic • In-memory data model: Data is prepared using programs which make objects of the data program are the actual recommendations for with product properties like product id, userid the user. In the case of multiple domains such as and ratings. fields of science (ceramics, nanotechnology, etc.), recommendations might not be accurate since • File-based data: Data is provided in a comma- user interests and expertise vary significantly. separated file along with a preference file. For such scenarios, recommendations should Implementation of a data refresh module will be evaluated with user domain or user demo- be required to reload data into the comma- graphics. Programs which aid in cross-verifica- separated file. tion of the results of recommendation engines • Database-based data: Relational databases are known as “evaluators.” Our IRS validates will be a good choice if the data is in the order recommended items with different relevance of gigabytes. However, the implication is that parameters such as scientist domain, technology a recommendation engine based on this model and scientist interest. The output of this program will be much slower than in-memory data rep- proved beneficial in offering concrete recommen- resentations. dations to scientists and researchers and, hence, conserved time. For our client, an IRS was created using a file-based data model. This approach delivers superior per- Key Challenges formance and execution time compared with The problems faced during development were a database-based model. Also, to sustain the compounded by the complexity of the algorithms freshness of the file and thereby return newer used, as well as the common problems of articles and resources with reduced latency, a software design. Apart from these developments, scheduled refresh module was used which runs issues unique to AI applications, such as the need cognizant 20-20 insights 4
  • 5. for good quality input data, also arose. Common The next challenge is providing recommenda- algorithms such as Tanimoto and Pearson tions to a user who is not part of a clear group; similarity have several advantages, such as the these users can be called “grey sheep.” The ability to detect the quality and features of an challenge of giving recommendations to new or item at the time of recommendation, especially unknown users is called the when a user rating on a scale of 1 to 5 is available. “grey sheep problem.” This Data clustering requires Another advantage is that these algorithms are problem mostly arises in useful in domains where analyzing data is difficult Internet applications with multiple machines to or costly, such as gathering data about science large and diverse customer handle massive data articles where domain knowledge is required to bases that access the IRS sets and stores all rate an article. application. A solution to this problem is to look for similar resources that are The first challenge in the development of our users based on similar demo- required to provide IRS is that the system was brand new and lacked graphic information or char- recommendations. user rating information. Huge amount of input acteristics. These users can data, but few users and no ratings is referred be technically called “neighbors.” To determine to as “Cold Start Problem.” A solution to this similarity, a “classification” machine-learning problem was to use the TanimotoCoefficient11 mechanism can be used. Similar neighbors can algorithm and binary dataset to feed the system be found by implementing different similarity with the much-sought-after data during the algorithms or by checking similar user properties. initial use of the system. In the course of time, The probability of this problem is significantly user rating data will be obtained after which the lower in intranet applications. The classification similarity algorithm and input data structure can algorithm trains the computer to classify similar be changed. This initial dataset can be prepared users in a particular group and teaches the IRS by collecting data such as maximum hits of a to find the specific group when a user is new to particular product or tracking visits of articles. the system. If a binary dataset is adopted then most suitable algorithms are TanimotoCoefficientSimilarity or Mahout currently supports two types of classifier LogLikelihoodSimilarity12 algorithms, since these approaches: Naive Bayes classifier and Comple- are specially defined for binary data and use the mentary Naive Bayes classifier. Naive’s classifier Jaccard13 coefficient formula. As the usage of the is the faster method for the problems such as recommendation system grows, huge amounts of grey sheep recommendations. This classifier data will be collected. It is advisable at this point includes two parts. First, it analyzes the existing to use clustering mechanisms to sustain per- documents or data and finds the common formance. A solution to deal with massive data features among them. This method is also called by means of data clustering is Hadoop.14 Data as preparing training examples. Secondly, clas- clustering requires multiple machines to handle sification is performed using the model that massive data sets and stores all resources that was created in the first step using the Bayes are required to provide recommendations. theorem15 to predict the category of a new item (see Figure 2). Diagram of Classification Technique Input Training Training Predictor Algorithms Documents/Items Examples Variable e.g. Bayes Based on Demographic Use User’s Property Properties of User to Classify New Predicted Items/ Predictor Model Estimated Category/ Documents Variable Variable Decision Similar Users are Detect Criteria to Classi ed in Predicted Property Classify the Users Speci c Group on which User Can Be Classi ed Figure 2 cognizant 20-20 insights 5
  • 6. Finally, sometimes providing recommendations to can be an important means of bolstering sales users only on product ratings will not be accurate; and increasing information awareness among problems may arise when the recommended customers. items originate from another domain to which the user is not related. This problem is known as In this paper, we examined how recommendation a “blind recommendation” problem. This problem systems can be applied to the scientific research is mostly prevalent in the post-production phase domain. The successful implementation of an of recommendation systems development. A IRS provided our client with flexibility in using solution to this problem is to use evaluators that pre-existing information resources. It solved the continuously verify and validate recommenda- problem of information overload by enabling tions with actual user details. scientists and researchers to use information resources more efficiently for their research. Conclusion Our IRS offers testimony that a recommendation Online businesses are working very hard to system that provides customizable recommen- increase their customer bases by providing dations can enable online companies to perform competitive prices, products and services. In business more effectively. this frantic effort, recommendation systems Footnotes 1 http://mahout.apache.org/ 2 https://cwiki.apache.org/MAHOUT/mahout-on-elastic-mapreduce.html as of August 2011 3 http://www.netflixprize.com/ 4 http://developer.ebay.com/DevZone/xml/docs/Reference/ebay/AddSellingManagerTemplate.html as of August 2011 5 http://www.practicalecommerce.com/articles/1942-10-Questions-on-Product-Recommendations as of August 2011 6 http://en.wikipedia.org/wiki/Collaborative_filtering 7 http://www2007.org/papers/paper570.pdf 8 http://www.manning.com/owen/Mahout_MEAP_CH01.pdf as of August 2011 9 http://blogoscoped.com/archive/2007-03-12-n67.html 10 http://news.cnet.com/8301-27076_3-10358662-248.html 11 http://en.wikipedia.org/wiki/Jaccard_index 12 http://en.wikipedia.org/wiki/Likelihood-ratio_test 13 http://en.wikipedia.org/wiki/Jaccard_index 14 http://hadoop.apache.org/ 15 http://en.wikipedia.org/wiki/Naive_Bayes_classifier References: Mahout: http://www.ibm.com/developerworks/opensource/library/j-mahout/index.html http://ilk.uvt.nl/~toine/phd-thesis/phd-thesis.pdf GroupLens Research Project: http://www.grouplens.org/papers/pdf/ec-99.pdf http://en.wikipedia.org/wiki/Recommender_system cognizant 20-20 insights 6
  • 7. About the Authors Anup Prakash Warade is a Cognizant Associate. He develops and supports multiple projects for key clients in the manufacturing and logistics industry. His interests include designing and developing Java EE Web applications. Arup holds a bachelor’s of engineering degree in electronics from the University of Pune in India. He can be reached at Anup.Warade@cognizant.com. Vignesh Murali Natarajan is a Cognizant Senior Associate. He is a technical trainer and a Java architect who is currently managing several Java EE applications for a manufacturing client. Vignesh holds a bachelor’s degree in computer science and engineering from the University of Madras. He can be reached at Vigneshmurali.Natarajan@cognizant.com. Siddharth Sharad Chandak is a Cognizant Project Manager overseeing numerous business-critical projects for clients in the manufacturing and logistics industry. His interests include architecting and designing enterprise software applications, software performance tuning and project management. Siddharth holds a master’s of science degree in computer science from Kansas State University. He can be reached at Siddharth.Chandak@cognizant.com. About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out- sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 130,000 employees as of September 30, 2011, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant. World Headquarters European Headquarters India Operations Headquarters 500 Frank W. Burr Blvd. 1 Kingdom Street #5/535, Old Mahabalipuram Road Teaneck, NJ 07666 USA Paddington Central Okkiyam Pettai, Thoraipakkam Phone: +1 201 801 0233 London W2 6BD Chennai, 600 096 India Fax: +1 201 801 0243 Phone: +44 (0) 20 7297 7600 Phone: +91 (0) 44 4209 6000 Toll Free: +1 888 937 3277 Fax: +44 (0) 20 7121 0102 Fax: +91 (0) 44 4209 6060 Email: inquiry@cognizant.com Email: infouk@cognizant.com Email: inquiryindia@cognizant.com © Copyright 2012, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.