SlideShare uma empresa Scribd logo
1 de 46
Baixar para ler offline
Aardvark
The Anatomy of a Large-Scale Social Search
                  Engine
                    by
 Damon Horowitz and Sepandar D. Kamvar




             Presented by
             Shalini Sahoo
                11/21/2011
Introduction
•   Library vs village paradigm
•   Traditional IR approaches follow the library paradigm
•   In a village, information is passed from person to person
•   The retrieval task consists of finding the right person (expert
    in that field)
•   Example queries: (Pg 1) “Do you have any good baby-sitter
    recommendations in Palo-Alto for my 6-year-old twins? I’m
    looking for somebody that won’t let them watch TV.”
    “Is it safe for me to take a cab alone at 3 am from SFO airport
    to my home in Berkeley?”

                                  2
Differences

             Library                                     Village

   Keywords are used to search            Natural language used to ask questions

                                            Verbose, highly contextualized and
   Short queries (~2.93 words)
                                                 subjective (~18.6 words)
Knowledge base created by content
                                          Community forms the knowledge base
          publishers

    Trust is based on authority                 Trust is based on intimacy

Retrieval involves finding the right         Retrieval involves finding the right
             document                                     person



                                      3
Aardvark
•   Is* a social search engine based on the village paradigm
•   It connects users live with friends or friends-of-friends who
    are able to answer their questions
•   Users submit questions via Aardvark’s website, email,
    instant messenger or app on mobile devices
•   It identifies and facilitates a live chat or email conversation
    with one or more topic experts in the users extended social
    network
•   It was mainly used for asking subjective questions for
    which human judgement or recommendation was desired

* was - Aardvark was shut down on September 30th 2011
                                4
Aardvark
   •   It was originally developed by The
       Mechanical Zoo, a San-Francisco-
       based startup founded in 2007 by
       Max Ventilla, Damon Horowitz




A prototype version     It was released       Google acquired it   Google shut down
   was launched            to public           for $50 million        Aardvark*


    Early 2008           March 2009             February 2010       September 2011


   * A fall spring-clean: http://googleblog.blogspot.com/2011/09/fall-spring-clean.html
                                          5
Outline
•   Overview

•   Anatomy

•   Examples

•   Analysis

•   Evaluation

•   Discussion




                 6
Outline
➡ Overview
    ‣   Main Components
    ‣   The Initiation of a User
    ‣   The Life of a Query
•   Anatomy

•   Examples

•   Analysis

•   Evaluation

•   Discussion




                                   7
Main Components
•   Crawler and Indexer: To find and label resources that
    contain information

•   Query Analyzer: To understand the user’s information
    need

•   Ranking Function: To select the best resources to provide
    the information

•   User Interface: To present the information to the user in
    an accessible and interactive form




                               8
The Initiation of a User
•       The first step involves forming the “Social Graph”
•       Users can import contacts from:
    -     social networking sites like Facebook or LinkedIn
    -     webmail program like Gmail or Yahoo mail
    -     invite friends to join

•       Users in a common group or community (e.g. studied at UT
        Austin, Google summer interns 2011) are added to the social
        graph
•       User’s topical expertise information is indexed:
    -     Users can indicate the topics in which they have expertise
    -     User’s friend can select topics for which they trust the user’s opinion
    -     Users can indicate their personal webpages or blogs
    -     User’s status updates from Facebook or Twitter (if available)

                                       9
The Initiation of a User
•   Forward Index: stores the userId, a scored list of topics,
    further scores about user behavior
•   From this forward index, an inverted index is constructed
•   Inverted Index: stores each topicId and a scored list of
    userIds (with expertise in that topic)
•   Inverted index also stores scored list of userIds for features
    like answer quality and response time




                               10
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
The Life of a Query
                                                                Question
                                                                Analyzer
                                                                classifiers
   Gateways
  IM
  Email       Transport   Msgs   Conversation Manager   RSR
                Layer

  Twitter                                                         Routing
  iPhone                                                          Engine
  Web                                                   {RS}*
  SMS



                                                                  Index
  Importers                            Database
  Web Content                          Users
  Facebook                             Graph
  LinkedIn                             Topics
  Etc.                                 Content




                                        11
Outline
✓ Overview

➡ Anatomy
    ‣   The Model
    ‣   Social Crawling
    ‣   Indexing People
    ‣   Analyzing Questions
    ‣   Ranking Algorithm
    ‣   User Interface

•   Examples

•   Analysis

•   Evaluation

•   Discussion

                              12
, Twitter or       3. ANATOMY
 xtracts top-
sis (see Sec-      3.1 The Model
 ark observes

 s.
            The Model
  (or electing
                       The core of Aardvark is a statistical model for routing
                   questions to potential answerers. We use a network variant
                   of what has been called an aspect model [12], that has two
  recorded in      primary features. First, it associates an unobserved class
scored list of     variable t 2 T with each observation (i.e., the successful
er’s behavior • A network variantqof an aspect In other is usedthe proba-
                   answer of question by user ui ). model words,
the Forward      - bility p(ui |q) that user i will successfully answer tquestion q each
                      It associates an unobserved class variable ∊ T with
The Inverted       depends on whether q is about the topics t inof question q by
                      observation (i.e., the successful answer which ui has
 userIds that      expertise1 :
pics, the In-         user ui)                          X
features like                             p(ui |q) =         p(ui |t)p(t|q)                    (1)
                        The second main feature of the model is that it defines
                                                        t2T
                                                                                                   breadth of th
                     a query-independent probability of success for each poten-                    signing inter
or a user are      1
 nd ready to     - Equation 1 a query-independent probabilitydegree of
                     tialdefines is a simplification based upon their of success for extended
                      It asker/answerer pair (ui , uj ), of what Aardvark actually an
                   uses to connectedness and profile similarity.we present it this
                               match queries to answerers, but In
                     social potential asker/answerer pair (u , other words,                        in the next s
                      each clarity and conciseness.
                   way define a probability p(u |u ) that user u i will deliver a uj)
                     we for                              i j                  i
                     satisfying answer to user u, ,u q) is defined as a composition
                                                       j regardless of the question.
                                                                                                   3.3 Inde
               • The We then define the scoring function s(u , u , q) as the com-
                        scoring function s(ui j,
                                                                                                      The centra
                                                                        i   j
                  of position ofprobabilities
                      the two the two probabilities.                                               the right use
                                                                           X                       In order to
                       s(ui , uj , q) = p(ui |uj ) · p(ui |q) = p(ui |uj )      p(ui |t)p(t|q)     to learn abo
                                                                           t2T
                                                                                                   be able to a
                                                                                            (2)
                                                                                                   users uj to w
                        Our goal in the ranking problem is: given a question q
                                                                                                      Topics. A
                     from user uj , return a ranked list of users ui 2 U that
                                                         13
                                                                                                   topics known
The Model
•   The goal of the ranking algorithm is: given a question q
    from user uj, return a ranked list of users ui ∊ U that
    maximizes the scoring function s(ui, uj, q)
•   The scoring function allows real-time routing because much
    of the computation is done offline
•   The only term which needs to be computed at query-time is
    p(t|q)
•   The distribution p(ui|t) assigns users to topics, and the
    distribution p(ui|uj) defines the Aardvark social graph, both
    of these are computed by the Indexer at signup time



                              14
Social Crawling
•   In Aardvark, people form the knowledge base rather than
    documents
•   The more active users there are, the more potential
    answerers
•   So it is important for Aardvark to create a good experience
    for users so that they remain active and inclined to invite
    their friends
•   The breadth of the Aardvark knowledge base depends upon
    designing interfaces and algorithms to update the topic lists
    for each user over time



                              15
Indexing People
•       The distribution (p(t|uj)) of topics known by user ui is
        computed from the following sources:
    -     Users can indicate the topics in which they have expertise
    -     User’s friend can select topics for which they trust the user’s opinion
    -     Users can indicate their personal webpages or blogs
    -     User’s status updates from Facebook or Twitter (if available)

•       Over time, Aardvark learns which topics not to send to a
        particular user by keeping track of:
    -     when user explicitly “mutes” a topic
    -     declines to answer questions about a topic when given the chance
    -     receives negative feedback on his answer from the asker



                                      16
Indexing People
     level of expertise than if he were alone in his group with
     knowledge in that area. Mathematically, for some user ui ,
                                                                        It is imp
                                                                      requiremen
     his group of friends U , and some topic t, if p(t|ui ) 6= 0,
                                   P                                  understand
     then s(t|ui ) = p(t|ui ) +      u2U p(t|u), where     is a small answerer.
•   Periodically, astopic strengthening algorithm form prob-
     constant. The values are then renormalized to
                                                          is used     lenge facin
•   For a user ui and his group of friends U, and for some topic to determi
     abilities.
        Aardvark then runs two smoothing algorithms the pur-          seeking (i.e
    t, if p(t|ui) ≠ 0, then
     pose of which are to record the possibility that the user may    mation nee
     be able to answer questions about additional topics not ex-      a given w
     plicitly recorded in her profile.+The u ∊ U p(t|u)
                  s(t|ui) = p(t|ui) n∑ first uses basic collabo-       contrast, i
     rative filtering techniques on topics (i.e., based on users with  who has th
     similar topics), the n is a small semantic similarity2 .
                  where second uses constant                          an answer
        Once all of these bootstrap, extraction, and smoothing        man intell
•   Other smoothing techniques are used to record the
     methods are applied, we have a list of topics and scores         asker can
    possibility that a user might be able topic scoresabout
     for a given user. Normalizing these to answer so that            and the h
     P
    additional itopics not explicitly mentioned in their profile derstandin
        t2T p(t|u ) = 1, we have a probability distribution for
     topics known by user ui . Using Bayes’ Law, we compute for       voice, sens
•   p(ui|t) is computed using Bayes’ law
     each topic and user:                                             forth, to d
                                                                      in a respo
                                    p(t|ui )p(ui )
                         p(ui |t) =                ,              (3) a social se
                                        p(t)                          question th
     using a uniform distribution for p(ui ) and observed topic
                                    17                                knowledge
Connectedness
•       Connectedness between users p(ui|uj) is computed using a
        weighted cosine similarity over the following feature set:
    -      Social connection
    -      Demographic similarity
    -      Profile similarity
    -      Vocabulary match
    -      Chattiness match
    -      Verbosity match
    -      Politeness match
    -      Speed match

•       p(ui|uj) is stored in the social graph



                                    18
Analyzing Questions
•       The main goal of the Question Analyzer is: given a question
        q , determine a scored list of topics p(t|q) for each question
•       The following classifiers are run on a question:
    -     NonQuestion Classifier
    -     InappropriateQuestion Classifier
    -     TrivialQuestion Classifier
    -     LocationSensitive Classifier

•       Next, the list of relevant topics is produced by merging
        outputs from several TopicMapper algorithms
    -     KeywordMatchTopicMapper
    -     TaxonomyTopicMapper
    -     SalientTermTopicMapper
    -     UserTagTopicMapper
                                    19
Analyzing Questions
•   The TopicMapper algorithms are continuously evaluated
•   Given a question all the returned topics to select an
    answerer, and a much larger list of relevant topics are
    assigned scores by two human judges
•   89% precision and 84% recall of relevant topics




                               20
Ranking Algorithm
•       The topic list generated by the Question Analyzer is sent to
        the Routing Engine which then determines the top
        answerers for the given question
•       The main factors that determines the ranking of users are:
    -     Topic expertise p(ui|q)
    -     Connectedness p(ui|uj)
    -     Availability

•       From this ordered list of users the Routing Engine then
        filters out users who should not be contacted
    -     based on preferred time of contact
    -     based on the frequency of times they have been contacted in the
          recent past


                                     21
User Interface
•   The various user interfaces of Aardvark are built on top of
    the real time communication channels such as IM, email,
    SMS, iPhone, Twitter and Web-based messaging




                              22
User Interface




           23
User Interface




           24
Outline
✓ Overview

✓ Anatomy
➡ Examples

•   Analysis

•   Evaluation

•   Discussion




                 25
Examples
     EXAMPLE 1                                             EXAMPLE 2
     (Question from Mark C./M/LosAltos,CA)                 (Question from James R./M/
     I am looking for a restaurant in San                  TwinPeaksWest,SF)
     Francisco that is open for lunch. Must be             What is the best new restaurant in San
     very high-end and fancy (this is for a small,         Francisco for a Monday business dinner?
     formal, post-wedding gathering of about 8             Fish & Farm? Gitane? Quince (a little older)?
     people).                                              (+7 minutes -- Answer from Paul D./M/
     (+4 minutes -- Answer from Nick T./28/M/              SanFrancisco,CA -- A friend of your friend
     SanFrancisco,CA -- a friend of your friend            Sebastian V.)
     Fritz Schwartz)                                       For business dinner I enjoyed Kokkari
     fringale (fringalesf.com) in soma is a good           Estiatorio at 200 Jackson. If you prefer a
     bet; small, fancy, french (the french actually        place in SOMA i recommend Ozumo (a great
     hang out there too). Lunch: Tuesday -                 sushi restaurant).
     Friday: 11:30am - 2:30pm
                                                           (Reply from James to Paul)
     (Reply from Mark to Nick)                             thx I like them both a lot but I am ready to try
     Thanks Nick, you are the best PM ever!                something new
     (Reply from Nick to Mark)                             (+1 hour -- Answer from Fred M./29/M/
     you're very welcome. hope the days they're            Marina,SF)
     open for lunch work...                                Quince is a little fancy... La Mar is pretty
                                                           fantastic for cevice - like the Slanted Door of
    EXAMPLE 3                                              peruvian food...

    (Question from Brian T./22/M/Castro,SF) What is a good place to take a spunky, off-the-cuff,
    social, and pretty girl for a nontraditional, fun, memorable dinner date in San Francisco?
    (+4 minutes -- Answer from Dan G./M/SanFrancisco,CA)
    Start with drinks at NocNoc (cheap, beer/wine only) and then dinner at RNM (expensive,
    across the street).
    (Reply from Brian to Dan) Thanks!
    (+6 minutes -- Answer from Anthony D./M/Sunnyvale,CA -- you are both in the Google group)
    Take her to the ROTL production of Tommy, in the Mission. Best show i've seen all year!
    (Reply from Brian to Anthony) Tommy as in the Who's rock opera? COOL!
    (+10 minutes -- Answer from Bob F./M/Mission,SF -- you are connected through Mathias' friend
    Samantha S.) Cool question. Spork is usually my top choice for a first date, because in addition
    to having great food and good really friendly service, it has an atmosphere that's perfectly in
    between casual and romantic. It's a quirky place, interesting funny menu, but not exactly non-
    traditional in the sense that you're not eating while suspended from the ceiling or anything
                                                      26
Examples




           27
Examples




           28
Outline
✓ Overview

✓ Anatomy
✓ Examples
➡ Analysis

•   Evaluation

•   Discussion




                 29
Analysis
•   As of October 2009, Aardvark had 90361 registered users
•   The average query volume was 3167.2 questions per day in
    this period Users




                             30
Analysis
•       Mobile users were particularly active
    -     It is easier to reply to questions in the form of IM or SMS on phone
    -     People are comfortable using natural language in an IM setting
          rather than in a web search setting

•       Questions are highly contextualized
    -     Average query length is 18.6 words

•       Questions often have a subjective element
             websites & internet apps                  business research
             music, movies, TV, books                  sports & recreation
                                                       home & cooking
                                                       finance & investing
            technology & programming                   miscellaneous
                                                       Aardvark


                        local services
                                                       travel

               product reviews & help
                                                       restaurants & bars
                                         31
music, movies, TV, books                                             sports & recreation
                                                                                                              home & cooking
                                                                                                              finance & investing
                              technology & programming                                                        miscellaneous
                                                                                                              Aardvark



             Analysis                                           local services
                                                                                                              travel

                                                    product reviews & help
                                                                                                              restaurants & bars
               •     Questions get answered quickly
                              Figure 8: Categories of questions sent to Aardvark
                                                            4
                                                         x 10
                                                   2.5
                              Questions Answered



                                                    2

 ser growth                                        1.5

                                                    1

m a coworker; and the                              0.5
-friend-of-friend. The
ailed, came from a user                             0
                                                                0−3 min    3−6 min 6−12 min 12−30 min30min−1hr 1−4 hr     4+ hr
d to both “restaurants”

                 • Answers are9: Distribution of questions and answering
ures of Aardvark is that      Figure of high quality
                              times.
 are hypercustomized to Answers are comprehensive and concise
                     -
nt restaurant recommen-
with a spunky and spon- Median answer lengthas mobile users [14].) Second, mo-
                     -
                                   times as active
                                                      was 22.2 words
 ing small formal family 70.4% of bile users of Aardvark are almost as active in absolute
                     -             inline feedback rated answers as ‘good’, 14.1%                                                   rated
business meeting — and             terms as mobile15.5%of Google (who have on average
                                   as ‘OK’ and users
 ize these constraints. It answers 5.68 mobile sessions perwere rated as ‘bad’
                                                             month [14]). This is quite sur-
 st of these examples (as          prising for a service that has only been available for 6
ons), the asker took the           months.
 ing out.
                                   We believe this is for 32 reasons. First, browsing
                                                            two
Analysis
•   There are a broad range of answerers
•   Social proximity matters
•   People are indexable




                               33
Outline
✓ Overview

✓ Anatomy
✓ Examples
✓ Analysis
➡ Evaluation

•   Discussion




                 34
Evaluation
•   Compared to Google!
•   “Do you want to help Aardvark run an experiment?” was
    inserted into a random sample of active questions
•   Users were asked to reformulate their question as a query
    and search on Google
•   Users time how long it took to find a satisfactory result and
    also rate the quality of answers
•   71.5% on Aardvark, with a mean rating of 3.93
•   70.5% on Google, with a mean rating of 3.07



                              35
Outline
✓ Overview

✓ Anatomy
✓ Examples
✓ Analysis
✓ Evaluation
➡ Discussion




               36
Discussion
•   Participation Fatigue: (Pg 9) “86.7% users have been
    contacted by Aardvark with a request to answer a question,
    and of those, 70% have looked at the question and 38%
    could answer a question. 20% of the users accounted for
    85% of answers” What happens when this thin slice of
    users get overwhelmed and start dropping out?
•   Availability: There can be cases when the topic expert(s) in
    your social graph might not be online. Do you think having
    an “offline” mode be helpful?
•   Evaluation: Can we get a better understanding of how well
    Aardvark worked had it been compared to another social
    search engine which works on the same paradigm? How
    can that be achieved?

                              37

Mais conteúdo relacionado

Semelhante a Aardvark shalini

Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataRay Schwartz
 
Global Adoption of Digital Publishing, EPUB 3, and the Open Web Platform
Global Adoption of Digital Publishing, EPUB 3,  and the Open Web Platform Global Adoption of Digital Publishing, EPUB 3,  and the Open Web Platform
Global Adoption of Digital Publishing, EPUB 3, and the Open Web Platform Taiwan Digital Publishing Forum
 
Xldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inXldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inliqiang xu
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishingBradley Allen
 
[100621]제안발표
[100621]제안발표[100621]제안발표
[100621]제안발표DongKyun Lee
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataDhaval Thakker
 
Content is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverContent is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverChris McNulty
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyHong (Jenny) Jing
 
Semantic Technology in Document Management
Semantic Technology in Document ManagementSemantic Technology in Document Management
Semantic Technology in Document ManagementGeorge Roth
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries mdabrowski
 
10 Things I Like in SharePoint 2013 Search
10 Things I Like in SharePoint 2013 Search10 Things I Like in SharePoint 2013 Search
10 Things I Like in SharePoint 2013 SearchSPC Adriatics
 
SPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 SearchSPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 SearchAgnes Molnar
 
KMWorld SharePoint 2010-Admin 101
KMWorld SharePoint 2010-Admin 101KMWorld SharePoint 2010-Admin 101
KMWorld SharePoint 2010-Admin 101Chris McNulty
 
Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)Bradley Allen
 
BlogForever poster
BlogForever posterBlogForever poster
BlogForever posterBlogForever
 
Building Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesBuilding Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesSusan Matveyeva
 
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesLearning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesMike Linksvayer
 
Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkPaolo Nesi
 

Semelhante a Aardvark shalini (20)

Crushing, Blending, and Stretching Data
Crushing, Blending, and Stretching DataCrushing, Blending, and Stretching Data
Crushing, Blending, and Stretching Data
 
Global Adoption of Digital Publishing, EPUB 3, and the Open Web Platform
Global Adoption of Digital Publishing, EPUB 3,  and the Open Web Platform Global Adoption of Digital Publishing, EPUB 3,  and the Open Web Platform
Global Adoption of Digital Publishing, EPUB 3, and the Open Web Platform
 
Xldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_inXldb2011 tue 1005_linked_in
Xldb2011 tue 1005_linked_in
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
 
[100621]제안발표
[100621]제안발표[100621]제안발표
[100621]제안발표
 
Integrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched dataIntegrating digital traces into a semantic enriched data
Integrating digital traces into a semantic enriched data
 
Content is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday DenverContent is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
Content is King - ECM in SharePoint 2010 - SharePoint Saturday Denver
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case Study
 
Semantic Technology in Document Management
Semantic Technology in Document ManagementSemantic Technology in Document Management
Semantic Technology in Document Management
 
Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries Geo-annotations in Semantic Digital Libraries
Geo-annotations in Semantic Digital Libraries
 
10 Things I Like in SharePoint 2013 Search
10 Things I Like in SharePoint 2013 Search10 Things I Like in SharePoint 2013 Search
10 Things I Like in SharePoint 2013 Search
 
SPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 SearchSPCAdriatics - 10 Things I Like In SharePoint 2013 Search
SPCAdriatics - 10 Things I Like In SharePoint 2013 Search
 
KMWorld SharePoint 2010-Admin 101
KMWorld SharePoint 2010-Admin 101KMWorld SharePoint 2010-Admin 101
KMWorld SharePoint 2010-Admin 101
 
58 64
58 6458 64
58 64
 
376 sspin2011 bradleyallen
376 sspin2011 bradleyallen376 sspin2011 bradleyallen
376 sspin2011 bradleyallen
 
Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)
 
BlogForever poster
BlogForever posterBlogForever poster
BlogForever poster
 
Building Collections in IRs from External Data Sources
Building Collections in IRs from External Data SourcesBuilding Collections in IRs from External Data Sources
Building Collections in IRs from External Data Sources
 
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best PracticesLearning Resource Metadata Initiative: Vocabulary Development Best Practices
Learning Resource Metadata Initiative: Vocabulary Development Best Practices
 
Indexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social NetworkIndexing and Searching Cross Media Content in a Social Network
Indexing and Searching Cross Media Content in a Social Network
 

Último

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Último (20)

How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Aardvark shalini

  • 1. Aardvark The Anatomy of a Large-Scale Social Search Engine by Damon Horowitz and Sepandar D. Kamvar Presented by Shalini Sahoo 11/21/2011
  • 2. Introduction • Library vs village paradigm • Traditional IR approaches follow the library paradigm • In a village, information is passed from person to person • The retrieval task consists of finding the right person (expert in that field) • Example queries: (Pg 1) “Do you have any good baby-sitter recommendations in Palo-Alto for my 6-year-old twins? I’m looking for somebody that won’t let them watch TV.” “Is it safe for me to take a cab alone at 3 am from SFO airport to my home in Berkeley?” 2
  • 3. Differences Library Village Keywords are used to search Natural language used to ask questions Verbose, highly contextualized and Short queries (~2.93 words) subjective (~18.6 words) Knowledge base created by content Community forms the knowledge base publishers Trust is based on authority Trust is based on intimacy Retrieval involves finding the right Retrieval involves finding the right document person 3
  • 4. Aardvark • Is* a social search engine based on the village paradigm • It connects users live with friends or friends-of-friends who are able to answer their questions • Users submit questions via Aardvark’s website, email, instant messenger or app on mobile devices • It identifies and facilitates a live chat or email conversation with one or more topic experts in the users extended social network • It was mainly used for asking subjective questions for which human judgement or recommendation was desired * was - Aardvark was shut down on September 30th 2011 4
  • 5. Aardvark • It was originally developed by The Mechanical Zoo, a San-Francisco- based startup founded in 2007 by Max Ventilla, Damon Horowitz A prototype version It was released Google acquired it Google shut down was launched to public for $50 million Aardvark* Early 2008 March 2009 February 2010 September 2011 * A fall spring-clean: http://googleblog.blogspot.com/2011/09/fall-spring-clean.html 5
  • 6. Outline • Overview • Anatomy • Examples • Analysis • Evaluation • Discussion 6
  • 7. Outline ➡ Overview ‣ Main Components ‣ The Initiation of a User ‣ The Life of a Query • Anatomy • Examples • Analysis • Evaluation • Discussion 7
  • 8. Main Components • Crawler and Indexer: To find and label resources that contain information • Query Analyzer: To understand the user’s information need • Ranking Function: To select the best resources to provide the information • User Interface: To present the information to the user in an accessible and interactive form 8
  • 9. The Initiation of a User • The first step involves forming the “Social Graph” • Users can import contacts from: - social networking sites like Facebook or LinkedIn - webmail program like Gmail or Yahoo mail - invite friends to join • Users in a common group or community (e.g. studied at UT Austin, Google summer interns 2011) are added to the social graph • User’s topical expertise information is indexed: - Users can indicate the topics in which they have expertise - User’s friend can select topics for which they trust the user’s opinion - Users can indicate their personal webpages or blogs - User’s status updates from Facebook or Twitter (if available) 9
  • 10. The Initiation of a User • Forward Index: stores the userId, a scored list of topics, further scores about user behavior • From this forward index, an inverted index is constructed • Inverted Index: stores each topicId and a scored list of userIds (with expertise in that topic) • Inverted index also stores scored list of userIds for features like answer quality and response time 10
  • 11. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 12. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 13. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 14. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 15. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 16. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 17. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 18. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 19. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 20. The Life of a Query Question Analyzer classifiers Gateways IM Email Transport Msgs Conversation Manager RSR Layer Twitter Routing iPhone Engine Web {RS}* SMS Index Importers Database Web Content Users Facebook Graph LinkedIn Topics Etc. Content 11
  • 21. Outline ✓ Overview ➡ Anatomy ‣ The Model ‣ Social Crawling ‣ Indexing People ‣ Analyzing Questions ‣ Ranking Algorithm ‣ User Interface • Examples • Analysis • Evaluation • Discussion 12
  • 22. , Twitter or 3. ANATOMY xtracts top- sis (see Sec- 3.1 The Model ark observes s. The Model (or electing The core of Aardvark is a statistical model for routing questions to potential answerers. We use a network variant of what has been called an aspect model [12], that has two recorded in primary features. First, it associates an unobserved class scored list of variable t 2 T with each observation (i.e., the successful er’s behavior • A network variantqof an aspect In other is usedthe proba- answer of question by user ui ). model words, the Forward - bility p(ui |q) that user i will successfully answer tquestion q each It associates an unobserved class variable ∊ T with The Inverted depends on whether q is about the topics t inof question q by observation (i.e., the successful answer which ui has userIds that expertise1 : pics, the In- user ui) X features like p(ui |q) = p(ui |t)p(t|q) (1) The second main feature of the model is that it defines t2T breadth of th a query-independent probability of success for each poten- signing inter or a user are 1 nd ready to - Equation 1 a query-independent probabilitydegree of tialdefines is a simplification based upon their of success for extended It asker/answerer pair (ui , uj ), of what Aardvark actually an uses to connectedness and profile similarity.we present it this match queries to answerers, but In social potential asker/answerer pair (u , other words, in the next s each clarity and conciseness. way define a probability p(u |u ) that user u i will deliver a uj) we for i j i satisfying answer to user u, ,u q) is defined as a composition j regardless of the question. 3.3 Inde • The We then define the scoring function s(u , u , q) as the com- scoring function s(ui j, The centra i j of position ofprobabilities the two the two probabilities. the right use X In order to s(ui , uj , q) = p(ui |uj ) · p(ui |q) = p(ui |uj ) p(ui |t)p(t|q) to learn abo t2T be able to a (2) users uj to w Our goal in the ranking problem is: given a question q Topics. A from user uj , return a ranked list of users ui 2 U that 13 topics known
  • 23. The Model • The goal of the ranking algorithm is: given a question q from user uj, return a ranked list of users ui ∊ U that maximizes the scoring function s(ui, uj, q) • The scoring function allows real-time routing because much of the computation is done offline • The only term which needs to be computed at query-time is p(t|q) • The distribution p(ui|t) assigns users to topics, and the distribution p(ui|uj) defines the Aardvark social graph, both of these are computed by the Indexer at signup time 14
  • 24. Social Crawling • In Aardvark, people form the knowledge base rather than documents • The more active users there are, the more potential answerers • So it is important for Aardvark to create a good experience for users so that they remain active and inclined to invite their friends • The breadth of the Aardvark knowledge base depends upon designing interfaces and algorithms to update the topic lists for each user over time 15
  • 25. Indexing People • The distribution (p(t|uj)) of topics known by user ui is computed from the following sources: - Users can indicate the topics in which they have expertise - User’s friend can select topics for which they trust the user’s opinion - Users can indicate their personal webpages or blogs - User’s status updates from Facebook or Twitter (if available) • Over time, Aardvark learns which topics not to send to a particular user by keeping track of: - when user explicitly “mutes” a topic - declines to answer questions about a topic when given the chance - receives negative feedback on his answer from the asker 16
  • 26. Indexing People level of expertise than if he were alone in his group with knowledge in that area. Mathematically, for some user ui , It is imp requiremen his group of friends U , and some topic t, if p(t|ui ) 6= 0, P understand then s(t|ui ) = p(t|ui ) + u2U p(t|u), where is a small answerer. • Periodically, astopic strengthening algorithm form prob- constant. The values are then renormalized to is used lenge facin • For a user ui and his group of friends U, and for some topic to determi abilities. Aardvark then runs two smoothing algorithms the pur- seeking (i.e t, if p(t|ui) ≠ 0, then pose of which are to record the possibility that the user may mation nee be able to answer questions about additional topics not ex- a given w plicitly recorded in her profile.+The u ∊ U p(t|u) s(t|ui) = p(t|ui) n∑ first uses basic collabo- contrast, i rative filtering techniques on topics (i.e., based on users with who has th similar topics), the n is a small semantic similarity2 . where second uses constant an answer Once all of these bootstrap, extraction, and smoothing man intell • Other smoothing techniques are used to record the methods are applied, we have a list of topics and scores asker can possibility that a user might be able topic scoresabout for a given user. Normalizing these to answer so that and the h P additional itopics not explicitly mentioned in their profile derstandin t2T p(t|u ) = 1, we have a probability distribution for topics known by user ui . Using Bayes’ Law, we compute for voice, sens • p(ui|t) is computed using Bayes’ law each topic and user: forth, to d in a respo p(t|ui )p(ui ) p(ui |t) = , (3) a social se p(t) question th using a uniform distribution for p(ui ) and observed topic 17 knowledge
  • 27. Connectedness • Connectedness between users p(ui|uj) is computed using a weighted cosine similarity over the following feature set: - Social connection - Demographic similarity - Profile similarity - Vocabulary match - Chattiness match - Verbosity match - Politeness match - Speed match • p(ui|uj) is stored in the social graph 18
  • 28. Analyzing Questions • The main goal of the Question Analyzer is: given a question q , determine a scored list of topics p(t|q) for each question • The following classifiers are run on a question: - NonQuestion Classifier - InappropriateQuestion Classifier - TrivialQuestion Classifier - LocationSensitive Classifier • Next, the list of relevant topics is produced by merging outputs from several TopicMapper algorithms - KeywordMatchTopicMapper - TaxonomyTopicMapper - SalientTermTopicMapper - UserTagTopicMapper 19
  • 29. Analyzing Questions • The TopicMapper algorithms are continuously evaluated • Given a question all the returned topics to select an answerer, and a much larger list of relevant topics are assigned scores by two human judges • 89% precision and 84% recall of relevant topics 20
  • 30. Ranking Algorithm • The topic list generated by the Question Analyzer is sent to the Routing Engine which then determines the top answerers for the given question • The main factors that determines the ranking of users are: - Topic expertise p(ui|q) - Connectedness p(ui|uj) - Availability • From this ordered list of users the Routing Engine then filters out users who should not be contacted - based on preferred time of contact - based on the frequency of times they have been contacted in the recent past 21
  • 31. User Interface • The various user interfaces of Aardvark are built on top of the real time communication channels such as IM, email, SMS, iPhone, Twitter and Web-based messaging 22
  • 34. Outline ✓ Overview ✓ Anatomy ➡ Examples • Analysis • Evaluation • Discussion 25
  • 35. Examples EXAMPLE 1 EXAMPLE 2 (Question from Mark C./M/LosAltos,CA) (Question from James R./M/ I am looking for a restaurant in San TwinPeaksWest,SF) Francisco that is open for lunch. Must be What is the best new restaurant in San very high-end and fancy (this is for a small, Francisco for a Monday business dinner? formal, post-wedding gathering of about 8 Fish & Farm? Gitane? Quince (a little older)? people). (+7 minutes -- Answer from Paul D./M/ (+4 minutes -- Answer from Nick T./28/M/ SanFrancisco,CA -- A friend of your friend SanFrancisco,CA -- a friend of your friend Sebastian V.) Fritz Schwartz) For business dinner I enjoyed Kokkari fringale (fringalesf.com) in soma is a good Estiatorio at 200 Jackson. If you prefer a bet; small, fancy, french (the french actually place in SOMA i recommend Ozumo (a great hang out there too). Lunch: Tuesday - sushi restaurant). Friday: 11:30am - 2:30pm (Reply from James to Paul) (Reply from Mark to Nick) thx I like them both a lot but I am ready to try Thanks Nick, you are the best PM ever! something new (Reply from Nick to Mark) (+1 hour -- Answer from Fred M./29/M/ you're very welcome. hope the days they're Marina,SF) open for lunch work... Quince is a little fancy... La Mar is pretty fantastic for cevice - like the Slanted Door of EXAMPLE 3 peruvian food... (Question from Brian T./22/M/Castro,SF) What is a good place to take a spunky, off-the-cuff, social, and pretty girl for a nontraditional, fun, memorable dinner date in San Francisco? (+4 minutes -- Answer from Dan G./M/SanFrancisco,CA) Start with drinks at NocNoc (cheap, beer/wine only) and then dinner at RNM (expensive, across the street). (Reply from Brian to Dan) Thanks! (+6 minutes -- Answer from Anthony D./M/Sunnyvale,CA -- you are both in the Google group) Take her to the ROTL production of Tommy, in the Mission. Best show i've seen all year! (Reply from Brian to Anthony) Tommy as in the Who's rock opera? COOL! (+10 minutes -- Answer from Bob F./M/Mission,SF -- you are connected through Mathias' friend Samantha S.) Cool question. Spork is usually my top choice for a first date, because in addition to having great food and good really friendly service, it has an atmosphere that's perfectly in between casual and romantic. It's a quirky place, interesting funny menu, but not exactly non- traditional in the sense that you're not eating while suspended from the ceiling or anything 26
  • 36. Examples 27
  • 37. Examples 28
  • 38. Outline ✓ Overview ✓ Anatomy ✓ Examples ➡ Analysis • Evaluation • Discussion 29
  • 39. Analysis • As of October 2009, Aardvark had 90361 registered users • The average query volume was 3167.2 questions per day in this period Users 30
  • 40. Analysis • Mobile users were particularly active - It is easier to reply to questions in the form of IM or SMS on phone - People are comfortable using natural language in an IM setting rather than in a web search setting • Questions are highly contextualized - Average query length is 18.6 words • Questions often have a subjective element websites & internet apps business research music, movies, TV, books sports & recreation home & cooking finance & investing technology & programming miscellaneous Aardvark local services travel product reviews & help restaurants & bars 31
  • 41. music, movies, TV, books sports & recreation home & cooking finance & investing technology & programming miscellaneous Aardvark Analysis local services travel product reviews & help restaurants & bars • Questions get answered quickly Figure 8: Categories of questions sent to Aardvark 4 x 10 2.5 Questions Answered 2 ser growth 1.5 1 m a coworker; and the 0.5 -friend-of-friend. The ailed, came from a user 0 0−3 min 3−6 min 6−12 min 12−30 min30min−1hr 1−4 hr 4+ hr d to both “restaurants” • Answers are9: Distribution of questions and answering ures of Aardvark is that Figure of high quality times. are hypercustomized to Answers are comprehensive and concise - nt restaurant recommen- with a spunky and spon- Median answer lengthas mobile users [14].) Second, mo- - times as active was 22.2 words ing small formal family 70.4% of bile users of Aardvark are almost as active in absolute - inline feedback rated answers as ‘good’, 14.1% rated business meeting — and terms as mobile15.5%of Google (who have on average as ‘OK’ and users ize these constraints. It answers 5.68 mobile sessions perwere rated as ‘bad’ month [14]). This is quite sur- st of these examples (as prising for a service that has only been available for 6 ons), the asker took the months. ing out. We believe this is for 32 reasons. First, browsing two
  • 42. Analysis • There are a broad range of answerers • Social proximity matters • People are indexable 33
  • 43. Outline ✓ Overview ✓ Anatomy ✓ Examples ✓ Analysis ➡ Evaluation • Discussion 34
  • 44. Evaluation • Compared to Google! • “Do you want to help Aardvark run an experiment?” was inserted into a random sample of active questions • Users were asked to reformulate their question as a query and search on Google • Users time how long it took to find a satisfactory result and also rate the quality of answers • 71.5% on Aardvark, with a mean rating of 3.93 • 70.5% on Google, with a mean rating of 3.07 35
  • 45. Outline ✓ Overview ✓ Anatomy ✓ Examples ✓ Analysis ✓ Evaluation ➡ Discussion 36
  • 46. Discussion • Participation Fatigue: (Pg 9) “86.7% users have been contacted by Aardvark with a request to answer a question, and of those, 70% have looked at the question and 38% could answer a question. 20% of the users accounted for 85% of answers” What happens when this thin slice of users get overwhelmed and start dropping out? • Availability: There can be cases when the topic expert(s) in your social graph might not be online. Do you think having an “offline” mode be helpful? • Evaluation: Can we get a better understanding of how well Aardvark worked had it been compared to another social search engine which works on the same paradigm? How can that be achieved? 37