SlideShare uma empresa Scribd logo
1 de 10
Baixar para ler offline
The Anatomy of a Large-Scale Social Search Engine

                         Damon Horowitz                                      Sepandar D. Kamvar
                                Aardvark                                         Stanford University
                 damon@aardvarkteam.com                                    sdkamvar@stanford.edu




ABSTRACT                                                          let them watch TV.” is better answered by a friend than the
We present Aardvark, a social search engine. With Aard-           library. These differences in information retrieval paradigm
vark, users ask a question, either by instant message, email,     require that a social search engine have very different archi-
web input, text message, or voice. Aardvark then routes the       tecture, algorithms, and user interfaces than a search engine
question to the person in the user’s extended social network      based on the library paradigm.
most likely to be able to answer that question. As compared          The fact that the library and the village paradigms of
to a traditional web search engine, where the challenge lies      knowledge acquisition complement one another nicely in the
in finding the right document to satisfy a user’s information      offline world suggests a broad opportunity on the web for
need, the challenge in a social search engine like Aardvark       social information retrieval.
lies in finding the right person to satisfy a user’s information   1.2 Aardvark
need. Further, while trust in a traditional search engine is
based on authority, in a social search engine like Aardvark,         In this paper, we present Aardvark, a social search engine
trust is based on intimacy. We describe how these considera-      based on the village paradigm. We describe in detail the ar-
tions inform the architecture, algorithms, and user interface     chitecture, ranking algorithms, and user interfaces in Aard-
of Aardvark, and how they are reflected in the behavior of         vark, and the design considerations that motivated them.
Aardvark users.                                                   We believe this to be useful to the research community
                                                                  for two reasons. First, the argument made in the original
                                                                  Anatomy paper [3] still holds true — since most search en-
1. INTRODUCTION                                                   gine development is done in industry rather than academia,
                                                                  the research literature describing end-to-end search engine
1.1 The Library and the Village                                   architecture is sparse. Second, the shift in paradigm opens
   Traditionally, the basic paradigm in information retrieval     up a number of interesting research questions in information
has been the library. Indeed, the field of IR has roots in         retrieval, for example around human expertise classification,
the library sciences, and Google itself came out of the Stan-     implicit network construction, and conversation design.
ford Digital Library project [13]. While this paradigm has           Following the architecture description, we present a statis-
clearly worked well in several contexts, it ignores another       tical analysis of usage patterns in Aardvark. We find that, as
age-old model for knowledge acquisition, which we shall call      compared to traditional search, Aardvark queries tend to be
“the village paradigm”. In a village, knowledge dissemi-          long, highly contextualized and subjective — in short, they
nation is achieved socially — information is passed from          tend to be the types of queries that are not well-serviced by
person to person, and the retrieval task consists of finding       traditional search engines. We also find that the vast ma-
the right person, rather than the right document, to answer       jority of questions get answered promptly and satisfactorily,
your question.                                                    and that users are surprisingly active, both in asking and
   The differences how people find information in a library         answering.
versus a village suggest some useful principles for designing        Finally, we present example results from the current Aard-
a social search engine. In a library, people use keywords to      vark system, and a comparative evaluation experiment. What
search, the knowledge base is created by a small number of        we find is that Aardvark performs very well on queries that
content publishers before the questions are asked, and trust      deal with opinion, advice, experience, or recommendations,
is based on authority. In a village, by contrast, people use      while traditional corpus-based search engines remain a good
natural language to ask questions, answers are generated in       choice for queries that are factual or navigational and lack
real-time by anyone in the community, and trust is based          a subjective element.
on intimacy. These properties have cascading effects — for
example, real-time responses from socially proximal respon-       2. OVERVIEW
ders tend to elicit (and work well for) highly contextualized
and subjective queries. For example, the query “Do you            2.1 Main Components
have any good babysitter recommendations in Palo Alto for           The main components of Aardvark are:
my 6-year-old twins? I’m looking for somebody that won’t
                                                                    1. Crawler and Indexer. To find and label resources that
Copyright is held by the author/owner(s).
WWW2010, April 26-30, 2010, Raleigh, North Carolina.                   contain information — in this case, users, not docu-
.                                                                      ments (Sections 3.2 and 3.3).
2. Query Analyzer. To understand the user’s information                                                                       Question
                                                                                                                                Analyzer
     need (Section 3.4).                                                                                                        classifiers
                                                                   Gateways
  3. Ranking Function. To select the best resources to pro-       IM




                                                                              Transport
                                                                  Email                   Msgs                          RSR
     vide the information (Section 3.5).                                                         Conversation Manager




                                                                                Layer
                                                                  Twitter                                                         Routing
                                                                  iPhone                                                          Engine
                                                                  Web                                                   {RS}*
  4. UI. To present the information to the user in an ac-         SMS
     cessible and interactive form (Section 3.6).

  Most corpus-based search engines have similar key com-                                                                          Index
ponents with similar aims [3], but the means of achieving         Importers                            Database
                                                                  Web Content                          Users
those aims are quite different.                                    Facebook                             Graph
  Before discussing the anatomy of Aardvark in depth, it is       LinkedIn                             Topics
useful to describe what happens behind the scenes when a          Etc.                                 Content
new user joins Aardvark and when a user asks a question.

2.2 The Initiation of a User                                     Figure 1: Schematic of the architecture of Aardvark
   When a new user first joins Aardvark, the Aardvark sys-
tem performs a number of indexing steps in order to be able
to direct the appropriate questions to her for answering.        created, the user is now active on the system and ready to
   Because questions in Aardvark are routed to the user’s ex-    ask her first question.
tended network, the first step involves indexing friendship
and affiliation information. The data structure responsible        2.3 The Life of a Query
for this is the Social Graph. Aardvark’s aim is not to build
a social network, but rather to allow people to make use of         A user can access Aardvark through many different means,
their existing social networks. As such, in the sign-up pro-     the most common being instant message (on the desktop)
cess, a new user has the option of connecting to a social net-   or text message (on the mobile phone). A user begins by
work such as Facebook or LinkedIn, importing their contact       asking a question. The question gets sent from the input
lists from a webmail program, or manually inviting friends       device to the Transport Layer, where it is normalized to a
to join. Additionally, anybody whom the user invites to          Message data structure, and sent to the Conversation Man-
join Aardvark is appended to their Social Graph – and such       ager. Once the Conversation Manager determines that the
invitations are a major source of new users. Finally, Aard-      message is a question, it sends the question to the Ques-
vark users are connected through common “groups” which           tion Analyzer to determine the appropriate classifications
reflect real-world affiliations they have, such as the schools      and topics for the question. The Conversation Manager in-
they have attended and the companies they have worked            forms the asker which primary topic was determined for the
at; these groups can be imported automatically from social       question, and gives the asker the opportunity to edit it. It
networks, or manually created by users. Aardvark indexes         simultaneously issues a Routing Suggestion Request to the
this information and stores it in the Social Graph, which is     Routing Engine. The routing engine plays a role analogous
a fixed width ISAM index sorted by userId.                        to the ranking function in a corpus-based search engine. It
   Simultaneously, Aardvark indexes the topics about which       accesses the Inverted Index and Social Graph for a list of
the new user has some level of knowledge or experience.          candidate answerers, and then ranks them to reflect how
This topical expertise can be garnered from several sources:     well it believes they can answer the question, and how good
a user can indicate topics in which he believes himself to       of a match they are for the asker. The Routing Engine then
have expertise; a user’s friends can indicate which topics       returns a ranked list of Routing Suggestions to the Conver-
they trust the user’s opinions about; a user can specify an      sation Manager. The Conversation Manager contacts the
existing structured profile page from which the Topic Parser      potential answerers — one by one, or a few at a time, de-
parses additional topics; a user can specify an account on       pending upon a Routing Policy — and asks them if they
which they regularly post status updates (e.g., Twitter or       would like to answer the question, until a satisfactory an-
Facebook), from which the Topic Extractor extracts top-          swer is found. The Conversation Manager then forwards this
ics (from unstructured text) in an ongoing basis (see Sec-       answer along to the asker, and allows the asker and answerer
tion 3.3 for more discussion); and finally, Aardvark observes     to exchange further follow-up messages if they choose.
the user’s behavior on Aardvark, in answering (or electing
not to answer) questions about particular topics.
   The set of topics associated with a user is recorded in
                                                                 3. ANATOMY
the Forward Index, which stores each userId, a scored list of
topics, and a series of further scores about a user’s behavior   3.1 The Model
(e.g., pertaining to their responsiveness and the quality of       The core of Aardvark is a statistical model for routing
their answers).                                                  questions to potential answerers. We use a network variant
   From the Forward Index, Aardvark constructs an Inverted       of what has been called an aspect model [8], that has two
Index. The Inverted Index stores each topicId and a scored       primary features. First, it associates an unobserved class
list of userIds that have expertise in that topic. In addition   variable t ∈ T = t1 , . . . , tn with each observation (i.e., the
to topics, the Inverted Index stores scored lists of userIds     successful answer of question q by user ui ). In other words,
for features like answer quality and response time.              the probability p(ui |q) that user i will successfully answer
   Once the Inverted Index and Social Graph for a user are       question q depends on whether q is about the topics t in
which ui has expertise:                                                     This suggests that the strategy for increasing the knowl-
                                X                                        edge base of Aardvark crucially involves creating a good
                   p(ui |q) =         p(ui |t)p(t|q)              (1)    experience for users so that they remain active and are in-
                                t∈T                                      clined to invite their friends. An extended discussion of this
                                                                         is outside of the scope of this paper; we mention it here only
   The second main feature of the model is that it defines a
                                                                         to emphasize the difference in the nature of “crawling” in
query-independent probability of success for each potential
                                                                         social search versus traditional search.
asker/answerer pair (ui , uj ), based upon their degree of so-
                                                                            Given a set of active users on Aardvark, the effective
cial connectedness and profile similarity. In other words, we
                                                                         breadth of the Aardvark knowledge base depends upon de-
define a probability p(ui |uj ) that user ui will deliver a sat-
                                                                         signing interfaces and algorithms that can collect and learn
isfying answer to user uj , regardless of the question. This
                                                                         an extended topic list for each user over time, as discussed
is reminiscent of earlier work in search in P2P Networks,
                                                                         in the next section.
where queries have been routed over relationship-based over-
lay networks [4].                                                        3.3 Indexing People
   We then define the scoring function s(ui , uj , q) as the com-
                                                                           The central technical challenge in Aardvark is selecting
position of the two probabilities.
                                                                         the right user to answer a given question from another user.
                                                                         In order to do this, the two main things Aardvark needs
                                                      X
  s(ui , uj , q) = p(ui |uj ) · p(ui |q) = p(ui |uj )   p(ui |t)p(t|q)
                                                       t∈T
                                                                         to learn about each user ui are: (1) the topics t he might
                                                            (2)          be able to answer questions about psmoothed (t|ui ); (2) the
   Our goal in the ranking problem is: given a question q                users uj to whom he is connected p(ui |uj ).
from user uj , return a ranked list of users ui ∈ U that                   Topics. Aardvark computes the distribution p(t|ui ) of
maximizes s(ui , uj , q).                                                topics known by user ui from the following sources of infor-
   Note that the scoring function is composed of a query-                mation:
dependent relevance score p(ui |q) and a query-independent
                                                                            • Users are prompted to provide at least three topics
quality score p(ui |uj ). This bears similarity to the ranking
                                                                              which they believe they have expertise about.
functions of traditional corpus-based search engines such as
Google [3]. The difference is that unlike quality scores like                • Friends of a user (and the person who invited a user)
PageRank [13], Aardvark’s quality score aims to measure                       are encouraged to provide a few topics that they trust
intimacy rather than authority. And unlike the relevance                      the user’s opinion about.
scores in corpus-based search engines, Aardvark’s relevance
score aims to measure a user’s potential to answer a query,                 • Aardvark parses out topics from users’ existing online
rather than a document’s existing capability to answer a                      profiles (e.g., Facebook profile pages, if provided). For
query.                                                                        such pages with a known structure, a simple Topic
   Computationally, this scoring function has a number of                     Parsing algorithm uses regular expressions which were
advantages. It allows real-time routing because it pushes                     manually devised for specific fields in the pages, based
much of the computation offline. The only component prob-                       upon their performance on test data.
ability that needs to be computed at query time is p(t|q).
Computing p(t|q) is equivalent to assigning topics to a ques-               • Aardvark automatically extracts topics from unstruc-
tion — in Aardvark we do this by running a probabilistic                      tured text on users’ existing online homepages or blogs
classifier on the question at query time (see Section 3.4).                    if provided. For unstructured text, a linear SVM iden-
The distribution p(ui |t) assigns users to topics, and the dis-               tifies the general subject area of the text, while an
tribution p(ui |uj ) defines the Aardvark Social Graph. Both                   ad-hoc named entity extractor is run to extract more
of these are computed by the Indexer at signup time, and                      specific topics, scaled by a variant tf-idf score.
then updated continuously in the background as users an-
                                                                            • Aardvark automatically extracts topics from users’ sta-
swer questions and get feedback (see Section 3.3). The com-
                                                                              tus message updates (e.g., Twitter messages, Facebook
ponent multiplications and sorting are also done at query
                                                                              news feed items, IM status messages, etc.) and from
time, but these are easily parallelizable, as the index is
                                                                              the messages they send to other users on Aardvark.
sharded by user.
                                                                           The motivation for using these latter sources of profile
3.2 Social Crawling                                                      topic information is a simple one: if you want to be able
   A comprehensive knowledge base is important for search                to predict what kind of content a user will generate (i.e.,
engines as query distributions tend to have a long tail [9]. In          p(t|ui )), first examine the content they have generated in
corpus-based search engines, this is achieved by large-scale             the past. In this spirit, Aardvark uses web content not as a
crawlers and thoughtful crawl policies. In Aardvark, the                 source of existing answers about a topic, but rather, as an
knowledge base consists of people rather than documents, so              indicator of the topics about which a user is likely able to
the methods for acquiring and expanding a comprehensive                  give new answers on demand.
knowledge base are quite different.                                         In essence, this involves modeling a user as a content-
   With Aardvark, the more active users there are, the more              generator, with probabilities indicating the likelihood she
potential answerers there are, and therefore the more com-               will likely respond to questions about given topics. Each
prehensive the coverage. More importantly, because Aard-                 topic in a user profile has an associated score, depending
vark looks for answerers primarily within a user’s extended              upon the confidence appropriate to the source of the topic.
social network, the denser the network, the larger the effec-             In addition, Aardvark learns over time which topics not to
tive knowledge base.                                                     send a user questions about by keeping track of cases when
the user: (1) explicitly “mutes” a topic; (2) declines to an-     3.4 Analyzing Questions
swer questions about a topic when given the opportunity;             The purpose of the Question Analyzer is to determine a
(3) receives negative feedback on his answer about the topic      scored list of topics p(t|q) for each question q representing
from another user.                                                the semantic subject matter of the question. This is the only
   Periodically, Aardvark will run a topic strengthening algo-    probability distribution in equation 2 that is computed at
rithm, the essential idea of which is: if a user has expertise    query time.
in a topic and most of his friends also have some exper-             It is important to note that in a social search system, the
tise in that topic, we have more confidence in that user’s         requirement for a Question Analyzer is only to be able to
level of expertise than if he were alone in his group with        understand the query sufficiently for routing it to a likely
knowledge in that area. Mathematically, for some user ui ,        answerer. This is a considerably simpler task than the chal-
his group of friends U , and some topic t, if p(t|ui ) = 0,
                            P                                     lenge facing an ideal web search engine, which must attempt
then s(t|ui ) = p(t|ui ) + γ u∈U p(t|u), where γ is a small       to determine exactly what piece of information the user is
constant. The s values are then renormalized to form prob-        seeking (i.e., given that the searcher must translate her infor-
abilities.                                                        mation need into search keywords), and to evaluate whether
   Aardvark then runs two smoothing algorithms the pur-           a given web page contains that piece of information. By
pose of which are to record the possibility that the user may     contrast, in a social search system, it is the human answerer
be able to answer questions about additional topics not ex-       who has the responsibility for determining the relevance of
plicitly recorded in her profile. The first uses basic collabo-     an answer to a quesion — and that is a function which hu-
rative filtering techniques on topics (i.e., based on users with   man intelligence is extremely well-suited to perform! The
similar topics), the second uses semantic similarity1 .           asker can express his information need in natural language,
   Once all of these bootstrap, extraction, and smoothing         and the human answerer can simply use her natural un-
methods are applied, we have a list of topics and scores          derstanding of the language of the question, of its tone of
for a given user. Normalizing these topic scores so that
P                                                                 voice, sense of urgency, sophistication or formality, and so
   t∈T p(t|ui ) = 1, we have a probability distribution for       forth, to determine what information is suitable to include
topics known by user ui . Using Bayes’ Law, we compute for        in a response. Thus, the role of the Question Analyzer in
each topic and user:                                              a social search system is simply to learn enough about the
                                p(t|ui )p(ui )                    question that it may be sent to appropriately interested and
                   p(ui |t) =                  ,           (3)    knowledgeable human answerers.
                                    p(t)
                                                                     As a first step, the following classifiers are run on each
using a uniform distribution for p(ui ) and observed topic        question: A NonQuestionClassifier determines if the input
frequencies for p(t).                                             is not actually a question (e.g., is it a misdirected message, a
   Aardvark collects these probabilities p(ui |t) indexed by      sequence of keywords, etc.); if so, the user is asked to submit
topic into the Inverted Index, which allows for easy lookup       a new question. An InappropriateQuestionClassifier deter-
when a question comes in.                                         mines if the input is obscene, commercial spam, or otherwise
   Connections. Aardvark computes the connectedness be-           inappropriate content for a public question-answering com-
tween users p(ui |uj ) in a number of ways. While social          munity; if so, the user is warned and asked to submit a new
proximity is very important here, we also take into account       question. A TrivialQuestionClassifier determines if the in-
similarities in demographics and behavior. The factors con-       put is a simple factual question which can be easily answered
sidered here include:                                             by existing common services (e.g., “What time is it now?”,
   • Social connection (common friends and affiliations)            “What is the weather?”, etc.); if so, the user is offered an
                                                                  automatically generated answer resulting from traditional
   • Demographic similarity                                       web search. A LocationSensitiveClassifier determines if the
   • Profile similarity (e.g., common favorite movies)             input is a question which requires knowledge of a particular
                                                                  location, usually in addition to specific topical knowledge
   • Vocabulary match (e.g., IM shortcuts)                        (e.g., “What’s a great sushi restaurant in Austin, TX?”); if
   • Chattiness match (frequency of follow-up messages)           so, the relevant location is determined and passed along to
                                                                  the Routing Engine with the question.
   • Verbosity match (the average length of messages)                Next, the list of topics relevant to a question is produced
                                                                  by merging the output of several distinct TopicMapper algo-
   • Politeness match (e.g., use of “Thanks!”)
                                                                  rithms, each of which suggests its own scored list of topics:
   • Speed match (responsiveness to other users)
                                                                      • A KeywordMatchTopicMapper passes any terms in the
Connection strengths between people are computed using a
                                                                        question which are string matches with user profile
weighted cosine similarity over this feature set, normalized
         P                                                              topics through a classifier which is trained to deter-
so that ui ∈U p(ui |uj ) = 1, and stored in the Social Graph
                                                                        mine whether a given match is likely to be semantically
for quick access at query time.
                                                                        significant or misleading.2
  Both the distributions p(ui |uj ) in the Social Graph and
p(t|ui ) in the Inverted Index are continuously updated as        2
users interact with one another on Aardvark.                       For example, if the string “camel wrestling” occurs in a
                                                                  question, it is likely to be semantically relevant to a user
1
  In both the Person Indexing and the Question Analysis           who has “camel wrestling” as a profile topic; whereas the
components, “semantic similarity” is computed by using            string “running” is too ambiguous to use in this manner
an approximation of distributional similarity computed over       without further validation, since it might errantly route a
Wikipedia and other corpora; this serves as a proxy measure       question about “running a business” to a user who knows
of the topics’ semantic relatedness.                              about fitness.
• A TaxonomyTopicMapper classifies the question text           sense of connection and similarity, and meet each other’s
     into a taxonomy of roughly 3000 popular question top-       expectations for conversational behavior in the interaction.
     ics, using an SVM trained on an annotated corpus of            Availability: Third, the Routing Engine prioritizes can-
     several millions questions.                                 didate answerers in such a way so as to optimize the chances
                                                                 that the present question will be answered, while also pre-
   • A SalientTermTopicMapper extracts salient phrases from      serving the available set of answerers (i.e., the quantity of
     the question — using a noun-phrase chunker and a tf-        “answering resource” in the system) as much as possible by
     idf–based measure of importance — and finds seman-           spreading out the answering load across the user base. This
     tically similar user topics.                                involves factors such as prioritizing users who are currently
                                                                 online (e.g., via IM presence data, iPhone usage, etc.), who
   • A UserTagTopicMapper takes any user “tags” provided         are historically active at the present time-of-day, and who
     by the asker (or by any would-be answerers), and maps       have not been contacted recently with a request to answer
     these to semantically-similar user topics.3                 a question.
                                                                    Given this ordered list of candidate answerers, the Rout-
  At present, the output distributions of these classifiers
                                                                 ing Engine then filters out users who should not be con-
are combined by weighted linear combination. It would be
                                                                 tacted, according to Aardvark’s guidelines for preserving a
interesting future work to explore other means of combin-
                                                                 high-quality user experience. These filters operate largely
ing heterogeneous classifiers, such as the maximum entropy
                                                                 as a set of rules: do not contact users who prefer to not be
model described in [11].
                                                                 contacted at the present time of day; do not contact users
  The Aardvark TopicMapper algorithms are continuously
                                                                 who have recently been contacted as many times as their
evaluated by manual scoring on random samples of 1000
                                                                 contact frequency settings permit; etc.
questions. The topics used for selecting candidate answerers,
                                                                    Since this is all done at query time, and the set of candi-
as well as a much larger list of possibly relevant topics, are
                                                                 date answerers can potentially be very large, it is useful to
assigned scores by two human judges, with a third judge
                                                                 note that this process is parallelizable. Each shard in the
adjudicating disagreements. For the current algorithms on
                                                                 Index computes its own ranking for the users in that shard,
the current sample of questions, this process yields overall
                                                                 and sends the top users to the Routing Engine. This is scal-
scores of 89% precision and 84% recall of relevant topics.
                                                                 able as the user base grows, since as more users are added,
In other words, 9 out of 10 times, Aardvark will be able
                                                                 more shards can be added.
to route a question to someone with relevant topics in her
                                                                    The list of candidate answerers who survive this filtering
profile; and Aardvark will identify 5 out of every 6 possibly
                                                                 process are returned to the Conversation Manager. The
relevant answerers for each question based upon their topics.
                                                                 Conversation Manager then proceeds with opening channels
3.5 The Aardvark Ranking Algorithm                               to each of them, serially, inquiring whether they would like
                                                                 to answer the present question; and iterating until an answer
   Ranking in Aardvark is done by the Routing Engine, which
                                                                 is provided and returned to the asker.
determines an ordered list of users (or “candidate answer-
ers”) who should be contacted to answer a question, given        3.6 User Interface
the asker of the question and the information about the ques-
                                                                    Since social search is modeled after the real-world process
tion derived by the Question Analyzer. The core ranking
                                                                 of asking questions to friends, the various user interfaces for
function is described by equation 2; essentially, the Routing
                                                                 Aardvark are built on top of the existing communication
Engine can be seen as computing equation 2 for all candidate
                                                                 channels that people use to ask questions to their friends:
answerers, sorting, and doing some postprocessing.
                                                                 IM, email, SMS, iPhone, Twitter, and Web-based messag-
   The main factors that determine this ranking of users are
                                                                 ing. Experiments were also done using actual voice input
Topic Expertise p(ui |q), Connectedness p(ui |uj ), and Avail-
                                                                 from phones, but this is not live in the current Aardvark
ability:
                                                                 production system.
   Topic Expertise: First, the Routing Engine finds the
                                                                    In its simplest form, the user interface for asking a ques-
subset of users who are semantic matches to the question:
those users whose profile topics indicate expertise relevant to
the topics which the question is about. Users whose profile
topics are closer matches to the question’s topics are given
higher rank. For questions which are location-sensitive (as
defined above), only users with matching locations in their
profiles are considered.
   Connectedness: Second, the Routing Engine scores each
user according to the degree to which she herself — as a
person, independently of her topical expertise — is a good
“match” for the asker for this information query. The goal
of this scoring is to optimize the degree to which the asker
and the answerer feel kinship and trust, arising from their
3
  A general principle in the design of Aardvark is to use hu-
man intelligence wherever possible to improve the quality of
the system. For the present task of Question Analysis, this
involves giving both askers and answerers prompts and sim-
ple commands for telling Aardvark directly what the subject
matter of a question is.                                              Figure 2: Screenshot of Aardvark over IM
aardvark: You there? I have a question about **tennis** that I think
  you might be able to answer. (Type 'sure', 'pass'‚ or 'busy'.)
    me: sure
  aardvark: (From Michael B./29/M/NewYork,US)
  Does anyone know which health clubs in Manhattan have tennis
  courts? Or are all the good options in the other boroughs?
    (Type 'pass' to skip, or 'more' for extra options.)
    me: why

  aardvark: I thought of you for Michael's question because
    - Michael is a friend of your friend Sally Johnson.
    - You know a lot about **racquet sports** and **fitness**
    - You and he have similar interests in **sports**
    (Type 'more' or change settings at http://vark.com/a/XXXXX)

    me: Well there is always the Midtown Tennis Club on 8th ave @27th
    if you really want to stay in manhattan -- but the quality isn't great.
    You'd do just as well to use the public courts in Central Park. Or
    another good option is to join NYHRC or NYSC in manhattan, and
    use their courts in other boroughs...

  aardvark: Great -- I've sent that to Michael. Thanks for the fast
  answer! (Type 'Michael:' followed by a message to add something, or
  'more' for options.)


                                                                              Figure 4: Screenshot of Aardvark Answering Tab on
Figure 3: Example of Aardvark interacting with an                             iPhone
answerer

                                                                              quite unusual and innovative behavior for a search engine,
tion on Aardvark is any kind of text input mechanism, along                   and it is a key to Aardvark’s ability to find answers to a wide
with a mechanism for displaying textual messages returned                     range of questions: the available set of potential answerers
from Aardvark. (This kind of very lightweight interface is                    is not just whatever users happen to be visiting a bulletin
important for making the search service available anywhere,                   board at the time a question is posted, but rather, the entire
especially now that mobile device usage is ubiquitous across                  set of users that Aardvark has contact information for. Be-
most of the globe.)                                                           cause this kind of “reaching out” to users has the potential
   However, Aardvark is most powerful when used through                       to become an unwelcome interruption if it happens too fre-
a chat-like interface that enables ongoing conversational in-                 quently, Aardvark sends such requests for answers usually
teraction. A private 1-to-1 conversation creates an intimacy                  less than once a day to a given user (and users can easily
which encourages both honesty and freedom within the con-                     change their contact settings, specifying prefered frequency
straints of real-world social norms. (By contrast, answering                  and time-of-day for such requests). Further, users can ask
forums where there is a public audience can both inhibit                      Aardvark “why” they were selected for a particular ques-
potential answerers [12] or or motivate public performance                    tion, and be given the option to easily change their profile
rather than authentic answering behavior [17].) Further,                      if they do not want such questions in the future. This is
in a real-time conversation, it is possible for an answerer                   very much like the real-world model of social information
to request clarifying information from the asker about her                    sharing: the person asking a question, or the intermediary
question, or for the asker to follow-up with further reactions                in Aardvark’s role, is careful not to impose too much upon
or inquiries to the answerer.                                                 a possible answerer. The ability to reach out to an extended
   There are two main interaction flows available in Aard-                     network beyond a user’s immediate friendships, without im-
vark for answering a question. The primary flow involves                       posing too frequently on that network, provides a key differ-
Aardvark sending a user a message (over IM, email, etc.),                     entiating experience from simply posting questions to one’s
asking if she would like to answer a question: for exam-                      Twitter or Facebook status message.
ple, “You there? A friend from the Stanford group has a                          A secondary flow of answering questions is more similar to
question about *search engine optimization* that I think                      traditional bulletin-board style interactions: a user sends a
you might be able to answer.”. If the user responds affir-                      message to Aardvark (e.g., “try”) or visits the “Answering”
matively, Aardvark relays the question as well as the name                    tab of Aardvark website or iPhone application (Figure 4),
of the questioner. The user may then simply type an an-                       and Aardvark shows the user a recent question from her
swer the question, type in a friend’s name or email address                   network which has not yet been answered and is related to
to refer it to someone else who might answer, or simply                       her profile topics. This mode involves the user initiating
“pass” on this request.4 While this may seem like an en-                      the exchange when she is in the mood to try to answer a
tirely quotidian interaction between friends, it is actually                  question; as such, it has the benefit of an eager potential
4
                                                                              answerer – but as the only mode of answering it does not
  There is no shame in “passing” on a question, since nobody                  effectively tap into the full diversity of the user base (since
else knows that the question was sent to you. Similarly,
there is no social cost to the user in asking a question, since               most users do not initiate these episodes). This is an im-
(unlike in the real-world) you are not directly imposing on a                 portant point: while almost everyone is happy to answer
friend or requesting a favor; rather, Aardvark plays the role                 questions (see Section 5) to help their friends or people they
of the intermediary who bears this social cost.                               are connected to, not everyone goes out of their way to do so.
"What fun bars downtown have outdoor                                                             EXAMPLE 1                                        EXAMPLE 2
 seating?"                                       "I've started running at least 4 days each       (Question from Mark C./M/LosAltos,CA)            (Question from James R./M/
 "I'm just getting into photography. Any         week, but I'm starting to get some knee and      I am looking for a restaurant in San             TwinPeaksWest,SF)
 suggestions for a digital camera that would     ankle pain. Any ideas about how to address       Francisco that is open for lunch. Must be        What is the best new restaurant in San
 be easy enough for me to use as a               this short of running less?"                     very high-end and fancy (this is for a small,    Francisco for a Monday business dinner?
 beginner, but I'll want to keep using for a                                                      formal, post-wedding gathering of about 8        Fish & Farm? Gitane? Quince (a little older)?
 while?"                                         "I need a good prank to play on my               people).                                         (+7 minutes -- Answer from Paul D./M/
                                                 supervisor. She has a good sense of              (+4 minutes -- Answer from Nick T./28/M/         SanFrancisco,CA -- A friend of your friend
 "I'm going to Berlin for two weeks and                                                           SanFrancisco,CA -- a friend of your friend
                                                 humor, but is overly professional. Any                                                            Sebastian V.)
 would like to take some day trips to places                                                      Fritz Schwartz)
                                                 ideas?"                                                                                           For business dinner I enjoyed Kokkari
 that aren't too touristy. Where should I go?"                                                    fringale (fringalesf.com) in soma is a good      Estiatorio at 200 Jackson. If you prefer a
                                                 "I just moved and have the perfect spot for      bet; small, fancy, french (the french actually   place in SOMA i recommend Ozumo (a great
 "My friend's in town and wants to see live                                                       hang out there too). Lunch: Tuesday -
 music. We both love bands like the              a plant in my living room. It gets a lot of                                                       sushi restaurant).
                                                                                                  Friday: 11:30am - 2:30pm
 Counting Crows. Any recommendations for         light from the north and south, but I know I                                                      (Reply from James to Paul)
                                                 won't be too reliable with watering. Any         (Reply from Mark to Nick)                        thx I like them both a lot but I am ready to try
 shows (of any size) to check out?"
                                                 suggestions for plants that won't die?"          Thanks Nick, you are the best PM ever!           something new
 "Is there any way to recover an unsaved                                                          (Reply from Nick to Mark)                        (+1 hour -- Answer from Fred M./29/M/
 Excel file that was closed manually on a         "Should I wear brown or black shoes with a       you're very welcome. hope the days they're       Marina,SF)
 Mac?"                                           light brown suit?"                               open for lunch work...                           Quince is a little fancy... La Mar is pretty
 "I'm putting together a focus group to talk                                                                                                       fantastic for cevice - like the Slanted Door of
                                                 "I need to analyze a Spanish poem for           EXAMPLE 3                                         peruvian food...
 about my brand new website. Any tips on
 making it as effective as possible?"            class. What are some interesting Spanish
                                                 poems that aren't too difficult to translate?"   (Question from Brian T./22/M/Castro,SF) What is a good place to take a spunky, off-the-cuff,
                                                                                                 social, and pretty girl for a nontraditional, fun, memorable dinner date in San Francisco?
 "I'm making cookies but ran out of baking
 powder. Is there anything I can substitute?"    "I always drive by men selling strawberries     (+4 minutes -- Answer from Dan G./M/SanFrancisco,CA)
                                                 on Stanford Ave. How much do they charge        Start with drinks at NocNoc (cheap, beer/wine only) and then dinner at RNM (expensive,
 "I have a job interview over lunch tomorrow.    per flat?"                                       across the street).
 Is there any interview restaurant etiquette                                                     (Reply from Brian to Dan) Thanks!
 that I should know?"                            "My girlfriend's ex bought her lots of
                                                                                                 (+6 minutes -- Answer from Anthony D./M/Sunnyvale,CA -- you are both in the Google group)
                                                 expensive presents on anniversaries. I'm        Take her to the ROTL production of Tommy, in the Mission. Best show i've seen all year!
 "I want to give my friend something that
                                                 pretty broke, but want to show her that I
 lasts as a graduation present, but someone
                                                 care. Any ideas for things I could do that      (Reply from Brian to Anthony) Tommy as in the Who's rock opera? COOL!
 already gave her jewelry. What else could I
                                                 are not too cliche?"                            (+10 minutes -- Answer from Bob F./M/Mission,SF -- you are connected through Mathias' friend
 give her?"
                                                                                                 Samantha S.) Cool question. Spork is usually my top choice for a first date, because in addition
                                                                                                 to having great food and good really friendly service, it has an atmosphere that's perfectly in
                                                                                                 between casual and romantic. It's a quirky place, interesting funny menu, but not exactly non-
 Figure 5: A random set of queries from Aardvark                                                 traditional in the sense that you're not eating while suspended from the ceiling or anything



                                                                                                  Figure 6: Three complete Aardvark interactions
This willingness to be helpful persists because when users
do answer questions, they report that it is a very gratifying
experience: they have been selected by Aardvark because                                          behave in the same ways they would in a comparable real-
of their expertise, they were able to help someone who had                                       world situation of asking for help and offering assistance.
a need in the moment, and they are frequently thanked for                                        All of the language that Aardvark uses is intended both to
their help by the asker.                                                                         be a communication mechanism between Aardvark and the
   In order to play the role of intermediary in an ongoing                                       user and an example of how to interact with Aardvark.
conversation, Aardvark must have some basic conversational                                         Overall, a large body of research [6, 1, 5, 16] shows that
intelligence in order to understand where to direct messages                                     when you provide a 1-1 communication channel, use real
from a user: is a given message a new question, a continua-                                      identities rather than pseudonyms, facilitate interactions be-
tion of a previous question, an answer to an earlier question,                                   tween existing real-world relationships, and consistently pro-
or a command to Aardvark? The details of how the Con-                                            vide examples of how to behave, users in an online commu-
versation Manager manages these complications and disam-                                         nity will behave in a manner that is far more authentic and
biguates user messages are not essential to the social search                                    helpful than pseudonymous multicasting environments with
model, so they are not elaborated here; but the basic ap-                                        no moderators. The design of the Aardvark’s UI has been
proach is to use a state machine to model the discourse                                          carefully crafted around these principles.
context.
   In all of the interfaces, wrappers around the messages from                                   4. EXAMPLES
another user include information about the user that can
                                                                                                    In this section we take a qualitative look at user behavior
facilitate trust: the user’s Real Name nametag, with their
                                                                                                 on Aardvark. Figure 5 shows a random sample of questions
name, age, gender, and location; the social connection be-
                                                                                                 asked on Aardvark in this period. Figure 6 takes a closer
tween you and the user (e.g., “Your friend on Facebook”,
                                                                                                 look at three questions sent to Aardvark during this period,
“A friend of your friend Marshall Smith”, “You are both in
                                                                                                 all three of which were categorized by the Question Analyzer
the Stanford group”, etc.); a selection of topics the user has
                                                                                                 under the primary topic “restaurants in San Francisco”.5
expertise in; and summary statistics of the user’s activity
                                                                                                    In Example 1, Aardvark opened 3 channels with candidate
on Aardvark (e.g., number of questions recently asked or
                                                                                                 answerers, which yielded 1 answer. An interesting (and not
answered).
                                                                                                 uncommon) aspect of this example is that the asker and the
   Finally, it is important throughout all of the above in-
                                                                                                 answerer in fact were already acquaintances, though only
teractions that Aardvark maintains a tone of voice which
                                                                                                 listed as “friends-of-friends” in their online social graphs;
is friendly, polite, and appreciative. A social search engine
                                                                                                 and they had a quick back-and-forth chat through Aardvark.
depends upon the goodwill and interest of its users, so it is
                                                                                                    In Example 2, Aardvark opened 4 channels with candidate
important to demonstrate the kind of (linguistic) behavior
                                                                                                 answerers, which yielded two answers. For this question, the
that can encourage these sentiments, in order to set a good
                                                                                                 “referral” feature was useful: one of the candidate answerers
example for users to adopt. Indeed, in user interviews, users
                                                                                                 was unable to answer, but referred the question along to a
often express their desire to have examples of how to speak
or behave socially when using Aardvark; since it is a novel                                      5
                                                                                                   Names and affiliations have been changed to protect pri-
paradigm, users do not immediately realize that they can                                         vacy.
websites & internet apps                                              business research
                                                                            music, movies, TV, books                                              sports & recreation
                                                                                                                                                  home & cooking
                                                                                                                                                  finance & investing
                                                                  technology & programming                                                        miscellaneous
                                                                                                                                                  Aardvark


                                                                                                    local services
       Users




                                                                                                                                                  travel

                                                                                        product reviews & help
                                                                                                                                                  restaurants & bars


                                                                  Figure 8: Categories of questions sent to Aardvark
                                                                                                4
                                                                                             x 10
                                                                                       2.5




                                                                  Questions Answered
                                                                                        2

               Figure 7: Aardvark user growth                                          1.5

                                                                                        1

friend to answer. The asker wanted more choices after the
                                                                                       0.5
first answer, and resubmitted the question to get another
recommendation, which came from a user whose profile top-                                0
                                                                                                    0−3 min    3−6 min 6−12 min 12−30 min30min−1hr 1−4 hr     4+ hr
ics related to business and finance (in addition to dining).
   In Example 3, Aardvark opened 10 channels with candi-
date answerers, yielding 3 answers. The first answer came          Figure 9: Distribution of questions and answering
from someone with only a distant social connection to the         times.
asker; the second answer came from a coworker; and the
third answer came from a friend-of-friend-of-friend. The                                     October 2009.
third answer, which is the most detailed, came from a user
who has topics in his profile related to both “restaurants”        Mobile users are particularly active Mobile users had
and “dating”.                                                        an average of 3.6322 sessions per month, which is sur-
   One of the most interesting features of Aardvark is that          prising on two levels. First, mobile users of Aardvark
it allows askers to get answers that are hypercustomized to          are more active than desktop users. (As a point of
their information need. Very different restaurant recommen-           comparison, on Google, desktop users are almost 3
dations are appropriate for a date with a spunky and spon-           times as active as mobile users [10].) Second, mo-
taneous young woman, a post-wedding small formal family              bile users of Aardvark are almost as active in absolute
gathering, and a Monday evening business meeting — and               terms as mobile users of Google (who have on average
human answerers are able to recognize these constraints. It          5.68 mobile sessions per month [10]). This is quite sur-
is also interesting to note that in most of these examples (as       prising for a service that has only been available for 6
in the majority of Aardvark questions), the asker took the           months.
time to thank the answerer for helping out.                                                  We believe this is for two reasons. First, browsing
                                                                                             through traditional web search results on a phone is
5. ANALYSIS                                                                                  unwieldy. On a phone, it’s more useful to get a sin-
   The following statistics give a picture of the current usage                              gle short answer that’s crafted exactly to your query.
and performance of Aardvark.                                                                 Second, people are used to using natural language with
   Aardvark was first made available semi-publicly in a beta                                  phones, and so Aardvark’s query model feels natural in
release in March of 2009. From March 1, 2009 to October                                      that context. These considerations (and early exper-
20, 2009, the number of users grew to 90,361, having asked                                   iments) also suggest that Aardvark mobile users will
a total of 225,047 questions and given 386,702 answers. All                                  be similarly active with voice-based search.
of the statistics below are taken from the last month of this
                                                                  Questions are highly contextualized As compared to web
period (9/20/2009-10/20/2009).
                                                                      search, where the average query length is between 2.2
Aardvark is actively used As of October, 2009, 90,361                 – 2.9 words [10, 14], with Aardvark, the average query
    users have created accounts on Aardvark, growing or-              length is 18.6 words (median=13). While some of this
    ganically from 2,272 users since March 2009. In this              increased length is due to the increased usage of func-
    period, 50,526 users (55.9% of the user base) generated           tion words, 45.3% of these words are content words
    content on Aardvark (i.e., asked or answered a ques-              that give context to the query. In other words, as
    tion), while 66,658 users (73.8% of the user base) pas-           compared to traditional web search, Aardvark ques-
    sively engaged (i.e., either referred or tagged other peo-        tions have 3–4 times as much context.
    ples questions). The average query volume was 3,167.2                                    The addition of context results in a greater diversity of
    questions per day in this period, and the median active                                  queries. While in Web search, between 57 and 63% of
    user issued 3.1 queries per month. Figure 7 shows the                                    queries are unique [14, 15], in Aardvark 98.1% of ques-
    number of users per month from early testing through                                     tions are unique (and 98.2% of answers are unique).
4                                             100
                  x 10
             3
                                                                   80
            2.5                                                    60




                                                               %
             2                                                     40
Questions




                                                                   20
            1.5
                                                                    0
             1                                                               >2        >4          >8           >16    >32
                                                                                             Number of Topics

            0.5
                                                               Figure 11: Distribution of percentage of users and
             0                                                 number of topics
                         0   1       2      3       4   5+
                                 Answers Received
                                                                         in-line feedback which askers provided on the answers
Figure 10: Distribution of questions and number of                       they received rated the answers as ‘good’, 14.1% rated
answers received.                                                        the answers as ‘OK’, and 15.5% rated the answers as
                                                                         ‘bad’.
Questions often have a subjective element A manual             There are a broad range of answerers 78,343 users (86.7%
    tally of 1000 random questions between March and Oc-           of users) have been contacted by Aardvark with a re-
    tober of 2009 shows that 64.7% of queries have a sub-          quest to answer a question, and of those, 70% have
    jective element to them (for example, “Do you know of          asked to look at the question, and 38.0% have been
    any great delis in Baltimore, MD?” or “What are the            able to answer. Additionally, 15,301 users (16.9% of
    things/crafts/toys your children have made that made           all users) have contacted Aardvark of their own initia-
    them really proud of themselves?”). In particular,             tive to try answering a question (see Section 3.6 for
    advice or recommendations queries regarding travel,            an explanation of these two modes of answering ques-
    restaurants, and products are very popular. A large            tions). Altogether, 45,160 users (50.0% of the total
    number of queries are locally oriented. About 10%              user base) have answered a question; this is 75% of all
    of questions related to local services, and 13% dealt          users who interacted with Aardvark at all in the pe-
    with restaurants and bars. Figure 8 shows the top              riod (66,658 users). As a comparison, only 27% of Ya-
    categories of questions sent to Aardvark. The distri-          hoo! Answers users have ever answered a question [7].
    bution is not dissimilar to that found with traditional        While a smaller portion of the Aardvark user base is
    web search engines [2], but with a much smaller pro-           much more active in answering questions – approxi-
    portion of reference, factual, and navigational queries,       mately 20% of the user base is responsible for 85% of
    and a much greater proportion of experience-oriented,          the total number of answers delivered to date – the dis-
    recommendation, local, and advice queries.                     tribution of answers across the user base is far broader
                                                                   than on a typical user-generated-content site [7].
Questions get answered quickly 87.7% of questions sub-
    mitted to Aardvark received at least 1 answer, and         Social Proximity Matters Of questions that were routed
    57.2% received their first answer in less than 10 min-           to somebody in the asker’s social network (most com-
    utes. On average, a question received 2.08 answers              monly a friend of a friend), 76% of the inline feedback
    (Figure 10), 6 and the median answering time was 6              rated the answer as ‘good’, whereas for those answers
    minutes and 37 seconds (Figure 9). By contrast, on              that came from outside the asker’s social network, 68%
    public question and answer forums such as Yahoo! An-            of them were rated as ‘good’.
    swers [7] most questions are not answered within the
                                                               People are indexable 97.7% of the user base has at least
    first 10 minutes, and for questions asked on Facebook,
                                                                   3 topics in their profiles, and the median user has 9
    only 15.7% of questions are answered within 15 min-
                                                                   topics in her profile. In sum, Aardvark users added
    utes [12]. (Of course, corpus-based search engines such
                                                                   1,199,323 topics to their profiles; not counting over-
    as Google return results in milliseconds, but many of
                                                                   lapping topics, this yields a total of 174,605 distinct
    the types of questions that are asked from Aardvark
                                                                   topics which the current Aardvark user base has ex-
    require extensive browsing and query refinement when
                                                                   pertise in. The currently most popular topics in user
    asked on corpus-based search engines.)
                                                                   profiles in Aardvark are “music”, “movies”, “technol-
Answers are high quality Aardvark answers are both com-            ogy”, and “cooking”, but in general most topics are
    prehensive and concise. The median answer length               as specific as “logo design” and “San Francisco pickup
    was 22.2 words; 22.9% of answers were over 50 words            soccer”.
    (i.e., the length of a whole paragraph); and 9.1% of
    answers included hypertext links in them. 70.4% of         6. EVALUATION
6                                                                To evaluate social search compared to web search, we ran
 A question may receive more than one answer when the
Routing Policy allows Aardvark to contact more than one        a side-by-side experiment with Google on a random sample
candidate answerer in parallel for a given question, or when   of Aardvark queries. We inserted a “Tip” into a random
the asker resubmits their question to request a second opin-   sample of active questions on Aardvark that read: ”Do you
ion.                                                           want to help Aardvark run an experiment?” with a link to
an instruction page that asked the user to reformulate their       8. REFERENCES
question as a keyword query and search on Google. We                [1] H. Bechar-Israeli. From <Bonehead> to
asked the users to time how long it took to find a satisfac-             <cLoNehEAd>: Nicknames, Play, and Identity on
tory answer on both Aardvark and Google, and to rate the                Internet Relay Chat. Journal of Computer-Mediated
answers from both on a 1-5 scale. If it took longer than 10             Communication, 1995.
minutes to find a satisfactory answer, we instructed the user        [2] S. M. Bietzel, E. C. Jensen, A. Chowdhury,
to give up. Of the 200 responders in the experiment set, we             D. Grossman, and O. Frieder. Hourly analysis of a
found that 71.5% of the queries were answered successfully              very large topically categorized web query log. In
on Aardvark, with a mean rating of 3.93 (σ = 1.23), while               Proceedings of the 27th SIGIR, 2004.
70.5% of the queries were answered successfully on Google,
                                                                    [3] S. Brin and L. Page. The anatomy of a large-scale
with a mean rating of 3.07 (σ = 1.46). The median time-to-
                                                                        hypertextual Web search engine. In Computer
satisfactory-response for Aardvark was 5 minutes (of passive
                                                                        Networks and ISDN Systems, 1998.
waiting), while the median time-to-satisfactory-response for
                                                                    [4] T. Condie, S. D. Kamvar, and H. Garcia-Molina.
Google was 2 minutes (of active searching).
                                                                        Adaptive peer-to-peer topologies. In Proceedings of the
   Of course, since this evaluation involves reviewing ques-
                                                                        Fourth International Conference on Peer-to-Peer
tions which users actually sent to Aardvark, we should ex-
                                                                        Computing, 2004.
pect that Aardvark would perform well — after all, users
chose these particular questions to send to Aardvark because        [5] A. R. Dennis and S. T. Kinney. Testing media richness
of their belief that it would be helpful in these cases. And            theory in the new media: The effects of cues,
in many cases, the users in the experiment noted that they              feedback, and task equivocality. Information Systems
sent their question to Aardvark specifically because a previ-            Research, 1998.
ous Google search on the topic was difficult to formulate or          [6] J. Donath. Identity and deception in the virtual
did not give a satisfactory result.7                                    community. Communities in Cyberspace, 1998.
   Thus we cannot conclude from this evaluation that social         [7] Z. Gyongyi, G. Koutrika, J. Pedersen, and
search will be equally successful for all kinds of questions.           H. Garcia-Molina. Questioning Yahoo! Answers. In
Further, we would assume that if the experiment were re-                Proceedings of the First WWW Workshop on Question
versed, and we used as our test set a random sample from                Answering on the Web, 2008.
Google’s query stream, the results of the experiment would          [8] T. Hofmann. Probabilistic latent semantic indexing.
be quite different. Indeed, for questions such as “What is               In Proceedings of the 22nd SIGIR, 1999.
the train schedule from Middletown, NJ to New York Penn             [9] B. J. Jansen, A. Spink, and T. Sarcevic. Real life, real
Station?”, traditional web search is a preferable option.               users, and real needs: a study and analysis of user
   However, the questions asked of Aardvark do represent                queries on the web. Information Processing and
a large and important class of information need: they are               Management, 2000.
typical of the kind of subjective questions for which it is        [10] M. Kamvar, M. Kellar, R. Patel, and Y. Xu.
difficult for traditional web search engines to provide satis-            Computers and iPhones and mobile phones, oh my!: a
fying results. The questions include background details and             logs-based comparison of search users on different
elements of context that specify exactly what the asker is              devices. In Proceedings of the 18th International
looking for, and it is not obvious how to translate these infor-        Conference on World Wide Web, 2009.
mation needs into keyword searches. Further, there are not         [11] D. Klein, K. Toutanova, H. T. Ilhan, S. D. Kamvar,
always existing web pages that contain exactly the content              and C. D. Manning. Combining heterogeneous
that is being sought; and in any event, it is difficult for the           classifiers for word-sense disambiguation. In
asker to assess whether any content that is returned is trust-          Proceedings of the SIGLEX/SENSEVAL Workshop on
worthy or right for them. In these cases, askers are looking            Word Sense Disambiguation, 2002.
for personal opinions, suggestions, recommendations, or ad-        [12] M. R. Morris, J. Teevan, and K. Panovich. What do
vice, from someone they feel a connection with and trust.               people ask their social networks, and why? a survey
The desire to have a fellow human being understand what                 study of status message Q&A behavior. In Proceedings
you are looking for and respond in a personalized manner in             of CHI, 2010.
real time is one of the main reasons why social search is an       [13] L. Page, S. Brin, R. Motwani, and T. Winograd. The
appealing mechanism for information retrieval.                          PageRank citation ranking: Bringing order to the
                                                                        Web. Stanford Digital Libraries Working Paper, 1998.
7. ACKNOWLEDGMENTS                                                 [14] C. Silverstein, M. Henziger, H. Marais, and
                                                                        M. Moricz. Analysis of a very large Web search engine
  We’d like to thank Max Ventilla and the entire Aardvark
                                                                        query log. In SIGIR Forum, 1999.
team for their contributions to the work reported here. We’d
especially like to thank Rob Spiro for his assistance on this      [15] A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic.
paper. And we are grateful to the authors of the original               From e-sex to e-commerce: Web search changes. IEEE
“Anatomy of a Large-Scale Hypertextual Web Search En-                   Computer, 2002.
gine” [3] from which we derived the title and the inspiration      [16] L. Sproull and S. Kiesler. Computers, networks, and
for this paper.                                                         work. In Global Networks:Computers and
                                                                        International Communication. MIT Press, 1993.
7                                                                  [17] M. Wesch. An anthropological introduction to
  For example: “Which golf courses in the San Francisco
Bay Area have the best drainage / are the most playable                 YouTube. Library of Congress, June, 2008.
during the winter (especially with all of the rain we’ve been
getting)?”

Mais conteúdo relacionado

Mais procurados

Semantic web personalization
Semantic web personalizationSemantic web personalization
Semantic web personalizationAlexander Decker
 
SEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEB
SEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEBSEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEB
SEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEBIJCI JOURNAL
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogsEtico Capital
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
 
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksTextual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksUvaraj Shan
 
Web search algorithms and user interfaces
Web search algorithms and user interfacesWeb search algorithms and user interfaces
Web search algorithms and user interfacesStefanos Anastasiadis
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 
Poster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen ManepatilPoster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen Manepatilap
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I pkaviya
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
 
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...IRJET Journal
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishingBradley Allen
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit Ipkaviya
 
Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)Bradley Allen
 

Mais procurados (18)

Semantic web personalization
Semantic web personalizationSemantic web personalization
Semantic web personalization
 
SEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEB
SEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEBSEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEB
SEMANTIC WEB: INFORMATION RETRIEVAL FROM WORLD WIDE WEB
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
Characterizing microblogs
Characterizing microblogsCharacterizing microblogs
Characterizing microblogs
 
Tweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity RecognitionTweet Segmentation and Its Application to Named Entity Recognition
Tweet Segmentation and Its Application to Named Entity Recognition
 
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networksTextual based retrieval system with bloom in unstructured Peer-to-Peer networks
Textual based retrieval system with bloom in unstructured Peer-to-Peer networks
 
K1803057782
K1803057782K1803057782
K1803057782
 
Ac02411221125
Ac02411221125Ac02411221125
Ac02411221125
 
Web search algorithms and user interfaces
Web search algorithms and user interfacesWeb search algorithms and user interfaces
Web search algorithms and user interfaces
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 
Poster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen ManepatilPoster Semantic Web - Abhijit Chandrasen Manepatil
Poster Semantic Web - Abhijit Chandrasen Manepatil
 
IT6701 Information Management - Unit I
IT6701 Information Management - Unit I  IT6701 Information Management - Unit I
IT6701 Information Management - Unit I
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
 
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...Improving Annotations in Digital Documents using Document Features and Fuzzy ...
Improving Annotations in Digital Documents using Document Features and Fuzzy ...
 
Linked data and the future of scientific publishing
Linked data and the future of scientific publishingLinked data and the future of scientific publishing
Linked data and the future of scientific publishing
 
dexa08linli
dexa08linlidexa08linli
dexa08linli
 
CS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit ICS6010 Social Network Analysis Unit I
CS6010 Social Network Analysis Unit I
 
Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)Innovation and the STM publisher of the future (SSP IN Conference 2011)
Innovation and the STM publisher of the future (SSP IN Conference 2011)
 

Destaque

Ill Be There For You Part Two
Ill Be There For You Part TwoIll Be There For You Part Two
Ill Be There For You Part Twondainye
 
Ownership and control in multinational joint ventures
Ownership and control in multinational joint venturesOwnership and control in multinational joint ventures
Ownership and control in multinational joint venturesanushreeg0
 
BPRRL Generation 11 Prologue
BPRRL Generation 11 PrologueBPRRL Generation 11 Prologue
BPRRL Generation 11 Prologuendainye
 
Elisha bc days 3 to 6
Elisha bc days 3 to 6Elisha bc days 3 to 6
Elisha bc days 3 to 6ndainye
 
Cisp Payment Application Best Practices
Cisp Payment Application Best PracticesCisp Payment Application Best Practices
Cisp Payment Application Best Practicesguestcc519e
 
Case Study Aardvark
Case Study AardvarkCase Study Aardvark
Case Study AardvarkFM Signal
 

Destaque (7)

Ill Be There For You Part Two
Ill Be There For You Part TwoIll Be There For You Part Two
Ill Be There For You Part Two
 
2003 December
2003 December2003 December
2003 December
 
Ownership and control in multinational joint ventures
Ownership and control in multinational joint venturesOwnership and control in multinational joint ventures
Ownership and control in multinational joint ventures
 
BPRRL Generation 11 Prologue
BPRRL Generation 11 PrologueBPRRL Generation 11 Prologue
BPRRL Generation 11 Prologue
 
Elisha bc days 3 to 6
Elisha bc days 3 to 6Elisha bc days 3 to 6
Elisha bc days 3 to 6
 
Cisp Payment Application Best Practices
Cisp Payment Application Best PracticesCisp Payment Application Best Practices
Cisp Payment Application Best Practices
 
Case Study Aardvark
Case Study AardvarkCase Study Aardvark
Case Study Aardvark
 

Semelhante a Aardvark Final Www2010

Aardvark : Social Search Engine
Aardvark : Social Search EngineAardvark : Social Search Engine
Aardvark : Social Search Enginekaranwayne
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methodsijcsity
 
Enhancing the Privacy Protection of the User Personalized Web Search Using RDF
Enhancing the Privacy Protection of the User Personalized Web Search Using RDFEnhancing the Privacy Protection of the User Personalized Web Search Using RDF
Enhancing the Privacy Protection of the User Personalized Web Search Using RDFIJTET Journal
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methodsijcsity
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...IRJET Journal
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...ijscai
 
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...IJSCAI Journal
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...ijcseit
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introductionnimmyjans4
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic WaveKaniska Mandal
 
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Waqas Tariq
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLPGVS Chaitanya
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...inventionjournals
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semanticijasa
 

Semelhante a Aardvark Final Www2010 (20)

Aardvark : Social Search Engine
Aardvark : Social Search EngineAardvark : Social Search Engine
Aardvark : Social Search Engine
 
Az31349353
Az31349353Az31349353
Az31349353
 
An Improved People-Search Technique for Directed Social Network Graphs
An Improved People-Search Technique for Directed Social Network GraphsAn Improved People-Search Technique for Directed Social Network Graphs
An Improved People-Search Technique for Directed Social Network Graphs
 
Modern association rule mining methods
Modern association rule mining methodsModern association rule mining methods
Modern association rule mining methods
 
Enhancing the Privacy Protection of the User Personalized Web Search Using RDF
Enhancing the Privacy Protection of the User Personalized Web Search Using RDFEnhancing the Privacy Protection of the User Personalized Web Search Using RDF
Enhancing the Privacy Protection of the User Personalized Web Search Using RDF
 
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
[IJET-V2I1P1] Authors:Anshika, Sujit Tak, Sandeep Ugale, Abhishek Pohekar
 
Modern Association Rule Mining Methods
Modern Association Rule Mining MethodsModern Association Rule Mining Methods
Modern Association Rule Mining Methods
 
Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...Algorithm for calculating relevance of documents in information retrieval sys...
Algorithm for calculating relevance of documents in information retrieval sys...
 
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
IMPLEMENTATION OF FOLKSONOMY BASED TAG CLOUD MODEL FOR INFORMATION RETRIEVAL ...
 
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
Implementation of Folksonomy Based Tag Cloud Model for Information Retrieval ...
 
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
Maximum Spanning Tree Model on Personalized Web Based Collaborative Learning ...
 
Introduction abstract
Introduction abstractIntroduction abstract
Introduction abstract
 
Information retrieval introduction
Information retrieval introductionInformation retrieval introduction
Information retrieval introduction
 
Riding The Semantic Wave
Riding The Semantic WaveRiding The Semantic Wave
Riding The Semantic Wave
 
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...Comparison of Semantic and Syntactic Information Retrieval System on the basi...
Comparison of Semantic and Syntactic Information Retrieval System on the basi...
 
Open domain Question Answering System - Research project in NLP
Open domain  Question Answering System - Research project in NLPOpen domain  Question Answering System - Research project in NLP
Open domain Question Answering System - Research project in NLP
 
Kp3518241828
Kp3518241828Kp3518241828
Kp3518241828
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
 
An imperative focus on semantic
An imperative focus on semanticAn imperative focus on semantic
An imperative focus on semantic
 
Al26234241
Al26234241Al26234241
Al26234241
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Aardvark Final Www2010

  • 1. The Anatomy of a Large-Scale Social Search Engine Damon Horowitz Sepandar D. Kamvar Aardvark Stanford University damon@aardvarkteam.com sdkamvar@stanford.edu ABSTRACT let them watch TV.” is better answered by a friend than the We present Aardvark, a social search engine. With Aard- library. These differences in information retrieval paradigm vark, users ask a question, either by instant message, email, require that a social search engine have very different archi- web input, text message, or voice. Aardvark then routes the tecture, algorithms, and user interfaces than a search engine question to the person in the user’s extended social network based on the library paradigm. most likely to be able to answer that question. As compared The fact that the library and the village paradigms of to a traditional web search engine, where the challenge lies knowledge acquisition complement one another nicely in the in finding the right document to satisfy a user’s information offline world suggests a broad opportunity on the web for need, the challenge in a social search engine like Aardvark social information retrieval. lies in finding the right person to satisfy a user’s information 1.2 Aardvark need. Further, while trust in a traditional search engine is based on authority, in a social search engine like Aardvark, In this paper, we present Aardvark, a social search engine trust is based on intimacy. We describe how these considera- based on the village paradigm. We describe in detail the ar- tions inform the architecture, algorithms, and user interface chitecture, ranking algorithms, and user interfaces in Aard- of Aardvark, and how they are reflected in the behavior of vark, and the design considerations that motivated them. Aardvark users. We believe this to be useful to the research community for two reasons. First, the argument made in the original Anatomy paper [3] still holds true — since most search en- 1. INTRODUCTION gine development is done in industry rather than academia, the research literature describing end-to-end search engine 1.1 The Library and the Village architecture is sparse. Second, the shift in paradigm opens Traditionally, the basic paradigm in information retrieval up a number of interesting research questions in information has been the library. Indeed, the field of IR has roots in retrieval, for example around human expertise classification, the library sciences, and Google itself came out of the Stan- implicit network construction, and conversation design. ford Digital Library project [13]. While this paradigm has Following the architecture description, we present a statis- clearly worked well in several contexts, it ignores another tical analysis of usage patterns in Aardvark. We find that, as age-old model for knowledge acquisition, which we shall call compared to traditional search, Aardvark queries tend to be “the village paradigm”. In a village, knowledge dissemi- long, highly contextualized and subjective — in short, they nation is achieved socially — information is passed from tend to be the types of queries that are not well-serviced by person to person, and the retrieval task consists of finding traditional search engines. We also find that the vast ma- the right person, rather than the right document, to answer jority of questions get answered promptly and satisfactorily, your question. and that users are surprisingly active, both in asking and The differences how people find information in a library answering. versus a village suggest some useful principles for designing Finally, we present example results from the current Aard- a social search engine. In a library, people use keywords to vark system, and a comparative evaluation experiment. What search, the knowledge base is created by a small number of we find is that Aardvark performs very well on queries that content publishers before the questions are asked, and trust deal with opinion, advice, experience, or recommendations, is based on authority. In a village, by contrast, people use while traditional corpus-based search engines remain a good natural language to ask questions, answers are generated in choice for queries that are factual or navigational and lack real-time by anyone in the community, and trust is based a subjective element. on intimacy. These properties have cascading effects — for example, real-time responses from socially proximal respon- 2. OVERVIEW ders tend to elicit (and work well for) highly contextualized and subjective queries. For example, the query “Do you 2.1 Main Components have any good babysitter recommendations in Palo Alto for The main components of Aardvark are: my 6-year-old twins? I’m looking for somebody that won’t 1. Crawler and Indexer. To find and label resources that Copyright is held by the author/owner(s). WWW2010, April 26-30, 2010, Raleigh, North Carolina. contain information — in this case, users, not docu- . ments (Sections 3.2 and 3.3).
  • 2. 2. Query Analyzer. To understand the user’s information Question Analyzer need (Section 3.4). classifiers Gateways 3. Ranking Function. To select the best resources to pro- IM Transport Email Msgs RSR vide the information (Section 3.5). Conversation Manager Layer Twitter Routing iPhone Engine Web {RS}* 4. UI. To present the information to the user in an ac- SMS cessible and interactive form (Section 3.6). Most corpus-based search engines have similar key com- Index ponents with similar aims [3], but the means of achieving Importers Database Web Content Users those aims are quite different. Facebook Graph Before discussing the anatomy of Aardvark in depth, it is LinkedIn Topics useful to describe what happens behind the scenes when a Etc. Content new user joins Aardvark and when a user asks a question. 2.2 The Initiation of a User Figure 1: Schematic of the architecture of Aardvark When a new user first joins Aardvark, the Aardvark sys- tem performs a number of indexing steps in order to be able to direct the appropriate questions to her for answering. created, the user is now active on the system and ready to Because questions in Aardvark are routed to the user’s ex- ask her first question. tended network, the first step involves indexing friendship and affiliation information. The data structure responsible 2.3 The Life of a Query for this is the Social Graph. Aardvark’s aim is not to build a social network, but rather to allow people to make use of A user can access Aardvark through many different means, their existing social networks. As such, in the sign-up pro- the most common being instant message (on the desktop) cess, a new user has the option of connecting to a social net- or text message (on the mobile phone). A user begins by work such as Facebook or LinkedIn, importing their contact asking a question. The question gets sent from the input lists from a webmail program, or manually inviting friends device to the Transport Layer, where it is normalized to a to join. Additionally, anybody whom the user invites to Message data structure, and sent to the Conversation Man- join Aardvark is appended to their Social Graph – and such ager. Once the Conversation Manager determines that the invitations are a major source of new users. Finally, Aard- message is a question, it sends the question to the Ques- vark users are connected through common “groups” which tion Analyzer to determine the appropriate classifications reflect real-world affiliations they have, such as the schools and topics for the question. The Conversation Manager in- they have attended and the companies they have worked forms the asker which primary topic was determined for the at; these groups can be imported automatically from social question, and gives the asker the opportunity to edit it. It networks, or manually created by users. Aardvark indexes simultaneously issues a Routing Suggestion Request to the this information and stores it in the Social Graph, which is Routing Engine. The routing engine plays a role analogous a fixed width ISAM index sorted by userId. to the ranking function in a corpus-based search engine. It Simultaneously, Aardvark indexes the topics about which accesses the Inverted Index and Social Graph for a list of the new user has some level of knowledge or experience. candidate answerers, and then ranks them to reflect how This topical expertise can be garnered from several sources: well it believes they can answer the question, and how good a user can indicate topics in which he believes himself to of a match they are for the asker. The Routing Engine then have expertise; a user’s friends can indicate which topics returns a ranked list of Routing Suggestions to the Conver- they trust the user’s opinions about; a user can specify an sation Manager. The Conversation Manager contacts the existing structured profile page from which the Topic Parser potential answerers — one by one, or a few at a time, de- parses additional topics; a user can specify an account on pending upon a Routing Policy — and asks them if they which they regularly post status updates (e.g., Twitter or would like to answer the question, until a satisfactory an- Facebook), from which the Topic Extractor extracts top- swer is found. The Conversation Manager then forwards this ics (from unstructured text) in an ongoing basis (see Sec- answer along to the asker, and allows the asker and answerer tion 3.3 for more discussion); and finally, Aardvark observes to exchange further follow-up messages if they choose. the user’s behavior on Aardvark, in answering (or electing not to answer) questions about particular topics. The set of topics associated with a user is recorded in 3. ANATOMY the Forward Index, which stores each userId, a scored list of topics, and a series of further scores about a user’s behavior 3.1 The Model (e.g., pertaining to their responsiveness and the quality of The core of Aardvark is a statistical model for routing their answers). questions to potential answerers. We use a network variant From the Forward Index, Aardvark constructs an Inverted of what has been called an aspect model [8], that has two Index. The Inverted Index stores each topicId and a scored primary features. First, it associates an unobserved class list of userIds that have expertise in that topic. In addition variable t ∈ T = t1 , . . . , tn with each observation (i.e., the to topics, the Inverted Index stores scored lists of userIds successful answer of question q by user ui ). In other words, for features like answer quality and response time. the probability p(ui |q) that user i will successfully answer Once the Inverted Index and Social Graph for a user are question q depends on whether q is about the topics t in
  • 3. which ui has expertise: This suggests that the strategy for increasing the knowl- X edge base of Aardvark crucially involves creating a good p(ui |q) = p(ui |t)p(t|q) (1) experience for users so that they remain active and are in- t∈T clined to invite their friends. An extended discussion of this is outside of the scope of this paper; we mention it here only The second main feature of the model is that it defines a to emphasize the difference in the nature of “crawling” in query-independent probability of success for each potential social search versus traditional search. asker/answerer pair (ui , uj ), based upon their degree of so- Given a set of active users on Aardvark, the effective cial connectedness and profile similarity. In other words, we breadth of the Aardvark knowledge base depends upon de- define a probability p(ui |uj ) that user ui will deliver a sat- signing interfaces and algorithms that can collect and learn isfying answer to user uj , regardless of the question. This an extended topic list for each user over time, as discussed is reminiscent of earlier work in search in P2P Networks, in the next section. where queries have been routed over relationship-based over- lay networks [4]. 3.3 Indexing People We then define the scoring function s(ui , uj , q) as the com- The central technical challenge in Aardvark is selecting position of the two probabilities. the right user to answer a given question from another user. In order to do this, the two main things Aardvark needs X s(ui , uj , q) = p(ui |uj ) · p(ui |q) = p(ui |uj ) p(ui |t)p(t|q) t∈T to learn about each user ui are: (1) the topics t he might (2) be able to answer questions about psmoothed (t|ui ); (2) the Our goal in the ranking problem is: given a question q users uj to whom he is connected p(ui |uj ). from user uj , return a ranked list of users ui ∈ U that Topics. Aardvark computes the distribution p(t|ui ) of maximizes s(ui , uj , q). topics known by user ui from the following sources of infor- Note that the scoring function is composed of a query- mation: dependent relevance score p(ui |q) and a query-independent • Users are prompted to provide at least three topics quality score p(ui |uj ). This bears similarity to the ranking which they believe they have expertise about. functions of traditional corpus-based search engines such as Google [3]. The difference is that unlike quality scores like • Friends of a user (and the person who invited a user) PageRank [13], Aardvark’s quality score aims to measure are encouraged to provide a few topics that they trust intimacy rather than authority. And unlike the relevance the user’s opinion about. scores in corpus-based search engines, Aardvark’s relevance score aims to measure a user’s potential to answer a query, • Aardvark parses out topics from users’ existing online rather than a document’s existing capability to answer a profiles (e.g., Facebook profile pages, if provided). For query. such pages with a known structure, a simple Topic Computationally, this scoring function has a number of Parsing algorithm uses regular expressions which were advantages. It allows real-time routing because it pushes manually devised for specific fields in the pages, based much of the computation offline. The only component prob- upon their performance on test data. ability that needs to be computed at query time is p(t|q). Computing p(t|q) is equivalent to assigning topics to a ques- • Aardvark automatically extracts topics from unstruc- tion — in Aardvark we do this by running a probabilistic tured text on users’ existing online homepages or blogs classifier on the question at query time (see Section 3.4). if provided. For unstructured text, a linear SVM iden- The distribution p(ui |t) assigns users to topics, and the dis- tifies the general subject area of the text, while an tribution p(ui |uj ) defines the Aardvark Social Graph. Both ad-hoc named entity extractor is run to extract more of these are computed by the Indexer at signup time, and specific topics, scaled by a variant tf-idf score. then updated continuously in the background as users an- • Aardvark automatically extracts topics from users’ sta- swer questions and get feedback (see Section 3.3). The com- tus message updates (e.g., Twitter messages, Facebook ponent multiplications and sorting are also done at query news feed items, IM status messages, etc.) and from time, but these are easily parallelizable, as the index is the messages they send to other users on Aardvark. sharded by user. The motivation for using these latter sources of profile 3.2 Social Crawling topic information is a simple one: if you want to be able A comprehensive knowledge base is important for search to predict what kind of content a user will generate (i.e., engines as query distributions tend to have a long tail [9]. In p(t|ui )), first examine the content they have generated in corpus-based search engines, this is achieved by large-scale the past. In this spirit, Aardvark uses web content not as a crawlers and thoughtful crawl policies. In Aardvark, the source of existing answers about a topic, but rather, as an knowledge base consists of people rather than documents, so indicator of the topics about which a user is likely able to the methods for acquiring and expanding a comprehensive give new answers on demand. knowledge base are quite different. In essence, this involves modeling a user as a content- With Aardvark, the more active users there are, the more generator, with probabilities indicating the likelihood she potential answerers there are, and therefore the more com- will likely respond to questions about given topics. Each prehensive the coverage. More importantly, because Aard- topic in a user profile has an associated score, depending vark looks for answerers primarily within a user’s extended upon the confidence appropriate to the source of the topic. social network, the denser the network, the larger the effec- In addition, Aardvark learns over time which topics not to tive knowledge base. send a user questions about by keeping track of cases when
  • 4. the user: (1) explicitly “mutes” a topic; (2) declines to an- 3.4 Analyzing Questions swer questions about a topic when given the opportunity; The purpose of the Question Analyzer is to determine a (3) receives negative feedback on his answer about the topic scored list of topics p(t|q) for each question q representing from another user. the semantic subject matter of the question. This is the only Periodically, Aardvark will run a topic strengthening algo- probability distribution in equation 2 that is computed at rithm, the essential idea of which is: if a user has expertise query time. in a topic and most of his friends also have some exper- It is important to note that in a social search system, the tise in that topic, we have more confidence in that user’s requirement for a Question Analyzer is only to be able to level of expertise than if he were alone in his group with understand the query sufficiently for routing it to a likely knowledge in that area. Mathematically, for some user ui , answerer. This is a considerably simpler task than the chal- his group of friends U , and some topic t, if p(t|ui ) = 0, P lenge facing an ideal web search engine, which must attempt then s(t|ui ) = p(t|ui ) + γ u∈U p(t|u), where γ is a small to determine exactly what piece of information the user is constant. The s values are then renormalized to form prob- seeking (i.e., given that the searcher must translate her infor- abilities. mation need into search keywords), and to evaluate whether Aardvark then runs two smoothing algorithms the pur- a given web page contains that piece of information. By pose of which are to record the possibility that the user may contrast, in a social search system, it is the human answerer be able to answer questions about additional topics not ex- who has the responsibility for determining the relevance of plicitly recorded in her profile. The first uses basic collabo- an answer to a quesion — and that is a function which hu- rative filtering techniques on topics (i.e., based on users with man intelligence is extremely well-suited to perform! The similar topics), the second uses semantic similarity1 . asker can express his information need in natural language, Once all of these bootstrap, extraction, and smoothing and the human answerer can simply use her natural un- methods are applied, we have a list of topics and scores derstanding of the language of the question, of its tone of for a given user. Normalizing these topic scores so that P voice, sense of urgency, sophistication or formality, and so t∈T p(t|ui ) = 1, we have a probability distribution for forth, to determine what information is suitable to include topics known by user ui . Using Bayes’ Law, we compute for in a response. Thus, the role of the Question Analyzer in each topic and user: a social search system is simply to learn enough about the p(t|ui )p(ui ) question that it may be sent to appropriately interested and p(ui |t) = , (3) knowledgeable human answerers. p(t) As a first step, the following classifiers are run on each using a uniform distribution for p(ui ) and observed topic question: A NonQuestionClassifier determines if the input frequencies for p(t). is not actually a question (e.g., is it a misdirected message, a Aardvark collects these probabilities p(ui |t) indexed by sequence of keywords, etc.); if so, the user is asked to submit topic into the Inverted Index, which allows for easy lookup a new question. An InappropriateQuestionClassifier deter- when a question comes in. mines if the input is obscene, commercial spam, or otherwise Connections. Aardvark computes the connectedness be- inappropriate content for a public question-answering com- tween users p(ui |uj ) in a number of ways. While social munity; if so, the user is warned and asked to submit a new proximity is very important here, we also take into account question. A TrivialQuestionClassifier determines if the in- similarities in demographics and behavior. The factors con- put is a simple factual question which can be easily answered sidered here include: by existing common services (e.g., “What time is it now?”, • Social connection (common friends and affiliations) “What is the weather?”, etc.); if so, the user is offered an automatically generated answer resulting from traditional • Demographic similarity web search. A LocationSensitiveClassifier determines if the • Profile similarity (e.g., common favorite movies) input is a question which requires knowledge of a particular location, usually in addition to specific topical knowledge • Vocabulary match (e.g., IM shortcuts) (e.g., “What’s a great sushi restaurant in Austin, TX?”); if • Chattiness match (frequency of follow-up messages) so, the relevant location is determined and passed along to the Routing Engine with the question. • Verbosity match (the average length of messages) Next, the list of topics relevant to a question is produced by merging the output of several distinct TopicMapper algo- • Politeness match (e.g., use of “Thanks!”) rithms, each of which suggests its own scored list of topics: • Speed match (responsiveness to other users) • A KeywordMatchTopicMapper passes any terms in the Connection strengths between people are computed using a question which are string matches with user profile weighted cosine similarity over this feature set, normalized P topics through a classifier which is trained to deter- so that ui ∈U p(ui |uj ) = 1, and stored in the Social Graph mine whether a given match is likely to be semantically for quick access at query time. significant or misleading.2 Both the distributions p(ui |uj ) in the Social Graph and p(t|ui ) in the Inverted Index are continuously updated as 2 users interact with one another on Aardvark. For example, if the string “camel wrestling” occurs in a question, it is likely to be semantically relevant to a user 1 In both the Person Indexing and the Question Analysis who has “camel wrestling” as a profile topic; whereas the components, “semantic similarity” is computed by using string “running” is too ambiguous to use in this manner an approximation of distributional similarity computed over without further validation, since it might errantly route a Wikipedia and other corpora; this serves as a proxy measure question about “running a business” to a user who knows of the topics’ semantic relatedness. about fitness.
  • 5. • A TaxonomyTopicMapper classifies the question text sense of connection and similarity, and meet each other’s into a taxonomy of roughly 3000 popular question top- expectations for conversational behavior in the interaction. ics, using an SVM trained on an annotated corpus of Availability: Third, the Routing Engine prioritizes can- several millions questions. didate answerers in such a way so as to optimize the chances that the present question will be answered, while also pre- • A SalientTermTopicMapper extracts salient phrases from serving the available set of answerers (i.e., the quantity of the question — using a noun-phrase chunker and a tf- “answering resource” in the system) as much as possible by idf–based measure of importance — and finds seman- spreading out the answering load across the user base. This tically similar user topics. involves factors such as prioritizing users who are currently online (e.g., via IM presence data, iPhone usage, etc.), who • A UserTagTopicMapper takes any user “tags” provided are historically active at the present time-of-day, and who by the asker (or by any would-be answerers), and maps have not been contacted recently with a request to answer these to semantically-similar user topics.3 a question. Given this ordered list of candidate answerers, the Rout- At present, the output distributions of these classifiers ing Engine then filters out users who should not be con- are combined by weighted linear combination. It would be tacted, according to Aardvark’s guidelines for preserving a interesting future work to explore other means of combin- high-quality user experience. These filters operate largely ing heterogeneous classifiers, such as the maximum entropy as a set of rules: do not contact users who prefer to not be model described in [11]. contacted at the present time of day; do not contact users The Aardvark TopicMapper algorithms are continuously who have recently been contacted as many times as their evaluated by manual scoring on random samples of 1000 contact frequency settings permit; etc. questions. The topics used for selecting candidate answerers, Since this is all done at query time, and the set of candi- as well as a much larger list of possibly relevant topics, are date answerers can potentially be very large, it is useful to assigned scores by two human judges, with a third judge note that this process is parallelizable. Each shard in the adjudicating disagreements. For the current algorithms on Index computes its own ranking for the users in that shard, the current sample of questions, this process yields overall and sends the top users to the Routing Engine. This is scal- scores of 89% precision and 84% recall of relevant topics. able as the user base grows, since as more users are added, In other words, 9 out of 10 times, Aardvark will be able more shards can be added. to route a question to someone with relevant topics in her The list of candidate answerers who survive this filtering profile; and Aardvark will identify 5 out of every 6 possibly process are returned to the Conversation Manager. The relevant answerers for each question based upon their topics. Conversation Manager then proceeds with opening channels 3.5 The Aardvark Ranking Algorithm to each of them, serially, inquiring whether they would like to answer the present question; and iterating until an answer Ranking in Aardvark is done by the Routing Engine, which is provided and returned to the asker. determines an ordered list of users (or “candidate answer- ers”) who should be contacted to answer a question, given 3.6 User Interface the asker of the question and the information about the ques- Since social search is modeled after the real-world process tion derived by the Question Analyzer. The core ranking of asking questions to friends, the various user interfaces for function is described by equation 2; essentially, the Routing Aardvark are built on top of the existing communication Engine can be seen as computing equation 2 for all candidate channels that people use to ask questions to their friends: answerers, sorting, and doing some postprocessing. IM, email, SMS, iPhone, Twitter, and Web-based messag- The main factors that determine this ranking of users are ing. Experiments were also done using actual voice input Topic Expertise p(ui |q), Connectedness p(ui |uj ), and Avail- from phones, but this is not live in the current Aardvark ability: production system. Topic Expertise: First, the Routing Engine finds the In its simplest form, the user interface for asking a ques- subset of users who are semantic matches to the question: those users whose profile topics indicate expertise relevant to the topics which the question is about. Users whose profile topics are closer matches to the question’s topics are given higher rank. For questions which are location-sensitive (as defined above), only users with matching locations in their profiles are considered. Connectedness: Second, the Routing Engine scores each user according to the degree to which she herself — as a person, independently of her topical expertise — is a good “match” for the asker for this information query. The goal of this scoring is to optimize the degree to which the asker and the answerer feel kinship and trust, arising from their 3 A general principle in the design of Aardvark is to use hu- man intelligence wherever possible to improve the quality of the system. For the present task of Question Analysis, this involves giving both askers and answerers prompts and sim- ple commands for telling Aardvark directly what the subject matter of a question is. Figure 2: Screenshot of Aardvark over IM
  • 6. aardvark: You there? I have a question about **tennis** that I think you might be able to answer. (Type 'sure', 'pass'‚ or 'busy'.) me: sure aardvark: (From Michael B./29/M/NewYork,US) Does anyone know which health clubs in Manhattan have tennis courts? Or are all the good options in the other boroughs? (Type 'pass' to skip, or 'more' for extra options.) me: why aardvark: I thought of you for Michael's question because - Michael is a friend of your friend Sally Johnson. - You know a lot about **racquet sports** and **fitness** - You and he have similar interests in **sports** (Type 'more' or change settings at http://vark.com/a/XXXXX) me: Well there is always the Midtown Tennis Club on 8th ave @27th if you really want to stay in manhattan -- but the quality isn't great. You'd do just as well to use the public courts in Central Park. Or another good option is to join NYHRC or NYSC in manhattan, and use their courts in other boroughs... aardvark: Great -- I've sent that to Michael. Thanks for the fast answer! (Type 'Michael:' followed by a message to add something, or 'more' for options.) Figure 4: Screenshot of Aardvark Answering Tab on Figure 3: Example of Aardvark interacting with an iPhone answerer quite unusual and innovative behavior for a search engine, tion on Aardvark is any kind of text input mechanism, along and it is a key to Aardvark’s ability to find answers to a wide with a mechanism for displaying textual messages returned range of questions: the available set of potential answerers from Aardvark. (This kind of very lightweight interface is is not just whatever users happen to be visiting a bulletin important for making the search service available anywhere, board at the time a question is posted, but rather, the entire especially now that mobile device usage is ubiquitous across set of users that Aardvark has contact information for. Be- most of the globe.) cause this kind of “reaching out” to users has the potential However, Aardvark is most powerful when used through to become an unwelcome interruption if it happens too fre- a chat-like interface that enables ongoing conversational in- quently, Aardvark sends such requests for answers usually teraction. A private 1-to-1 conversation creates an intimacy less than once a day to a given user (and users can easily which encourages both honesty and freedom within the con- change their contact settings, specifying prefered frequency straints of real-world social norms. (By contrast, answering and time-of-day for such requests). Further, users can ask forums where there is a public audience can both inhibit Aardvark “why” they were selected for a particular ques- potential answerers [12] or or motivate public performance tion, and be given the option to easily change their profile rather than authentic answering behavior [17].) Further, if they do not want such questions in the future. This is in a real-time conversation, it is possible for an answerer very much like the real-world model of social information to request clarifying information from the asker about her sharing: the person asking a question, or the intermediary question, or for the asker to follow-up with further reactions in Aardvark’s role, is careful not to impose too much upon or inquiries to the answerer. a possible answerer. The ability to reach out to an extended There are two main interaction flows available in Aard- network beyond a user’s immediate friendships, without im- vark for answering a question. The primary flow involves posing too frequently on that network, provides a key differ- Aardvark sending a user a message (over IM, email, etc.), entiating experience from simply posting questions to one’s asking if she would like to answer a question: for exam- Twitter or Facebook status message. ple, “You there? A friend from the Stanford group has a A secondary flow of answering questions is more similar to question about *search engine optimization* that I think traditional bulletin-board style interactions: a user sends a you might be able to answer.”. If the user responds affir- message to Aardvark (e.g., “try”) or visits the “Answering” matively, Aardvark relays the question as well as the name tab of Aardvark website or iPhone application (Figure 4), of the questioner. The user may then simply type an an- and Aardvark shows the user a recent question from her swer the question, type in a friend’s name or email address network which has not yet been answered and is related to to refer it to someone else who might answer, or simply her profile topics. This mode involves the user initiating “pass” on this request.4 While this may seem like an en- the exchange when she is in the mood to try to answer a tirely quotidian interaction between friends, it is actually question; as such, it has the benefit of an eager potential 4 answerer – but as the only mode of answering it does not There is no shame in “passing” on a question, since nobody effectively tap into the full diversity of the user base (since else knows that the question was sent to you. Similarly, there is no social cost to the user in asking a question, since most users do not initiate these episodes). This is an im- (unlike in the real-world) you are not directly imposing on a portant point: while almost everyone is happy to answer friend or requesting a favor; rather, Aardvark plays the role questions (see Section 5) to help their friends or people they of the intermediary who bears this social cost. are connected to, not everyone goes out of their way to do so.
  • 7. "What fun bars downtown have outdoor EXAMPLE 1 EXAMPLE 2 seating?" "I've started running at least 4 days each (Question from Mark C./M/LosAltos,CA) (Question from James R./M/ "I'm just getting into photography. Any week, but I'm starting to get some knee and I am looking for a restaurant in San TwinPeaksWest,SF) suggestions for a digital camera that would ankle pain. Any ideas about how to address Francisco that is open for lunch. Must be What is the best new restaurant in San be easy enough for me to use as a this short of running less?" very high-end and fancy (this is for a small, Francisco for a Monday business dinner? beginner, but I'll want to keep using for a formal, post-wedding gathering of about 8 Fish & Farm? Gitane? Quince (a little older)? while?" "I need a good prank to play on my people). (+7 minutes -- Answer from Paul D./M/ supervisor. She has a good sense of (+4 minutes -- Answer from Nick T./28/M/ SanFrancisco,CA -- A friend of your friend "I'm going to Berlin for two weeks and SanFrancisco,CA -- a friend of your friend humor, but is overly professional. Any Sebastian V.) would like to take some day trips to places Fritz Schwartz) ideas?" For business dinner I enjoyed Kokkari that aren't too touristy. Where should I go?" fringale (fringalesf.com) in soma is a good Estiatorio at 200 Jackson. If you prefer a "I just moved and have the perfect spot for bet; small, fancy, french (the french actually place in SOMA i recommend Ozumo (a great "My friend's in town and wants to see live hang out there too). Lunch: Tuesday - music. We both love bands like the a plant in my living room. It gets a lot of sushi restaurant). Friday: 11:30am - 2:30pm Counting Crows. Any recommendations for light from the north and south, but I know I (Reply from James to Paul) won't be too reliable with watering. Any (Reply from Mark to Nick) thx I like them both a lot but I am ready to try shows (of any size) to check out?" suggestions for plants that won't die?" Thanks Nick, you are the best PM ever! something new "Is there any way to recover an unsaved (Reply from Nick to Mark) (+1 hour -- Answer from Fred M./29/M/ Excel file that was closed manually on a "Should I wear brown or black shoes with a you're very welcome. hope the days they're Marina,SF) Mac?" light brown suit?" open for lunch work... Quince is a little fancy... La Mar is pretty "I'm putting together a focus group to talk fantastic for cevice - like the Slanted Door of "I need to analyze a Spanish poem for EXAMPLE 3 peruvian food... about my brand new website. Any tips on making it as effective as possible?" class. What are some interesting Spanish poems that aren't too difficult to translate?" (Question from Brian T./22/M/Castro,SF) What is a good place to take a spunky, off-the-cuff, social, and pretty girl for a nontraditional, fun, memorable dinner date in San Francisco? "I'm making cookies but ran out of baking powder. Is there anything I can substitute?" "I always drive by men selling strawberries (+4 minutes -- Answer from Dan G./M/SanFrancisco,CA) on Stanford Ave. How much do they charge Start with drinks at NocNoc (cheap, beer/wine only) and then dinner at RNM (expensive, "I have a job interview over lunch tomorrow. per flat?" across the street). Is there any interview restaurant etiquette (Reply from Brian to Dan) Thanks! that I should know?" "My girlfriend's ex bought her lots of (+6 minutes -- Answer from Anthony D./M/Sunnyvale,CA -- you are both in the Google group) expensive presents on anniversaries. I'm Take her to the ROTL production of Tommy, in the Mission. Best show i've seen all year! "I want to give my friend something that pretty broke, but want to show her that I lasts as a graduation present, but someone care. Any ideas for things I could do that (Reply from Brian to Anthony) Tommy as in the Who's rock opera? COOL! already gave her jewelry. What else could I are not too cliche?" (+10 minutes -- Answer from Bob F./M/Mission,SF -- you are connected through Mathias' friend give her?" Samantha S.) Cool question. Spork is usually my top choice for a first date, because in addition to having great food and good really friendly service, it has an atmosphere that's perfectly in between casual and romantic. It's a quirky place, interesting funny menu, but not exactly non- Figure 5: A random set of queries from Aardvark traditional in the sense that you're not eating while suspended from the ceiling or anything Figure 6: Three complete Aardvark interactions This willingness to be helpful persists because when users do answer questions, they report that it is a very gratifying experience: they have been selected by Aardvark because behave in the same ways they would in a comparable real- of their expertise, they were able to help someone who had world situation of asking for help and offering assistance. a need in the moment, and they are frequently thanked for All of the language that Aardvark uses is intended both to their help by the asker. be a communication mechanism between Aardvark and the In order to play the role of intermediary in an ongoing user and an example of how to interact with Aardvark. conversation, Aardvark must have some basic conversational Overall, a large body of research [6, 1, 5, 16] shows that intelligence in order to understand where to direct messages when you provide a 1-1 communication channel, use real from a user: is a given message a new question, a continua- identities rather than pseudonyms, facilitate interactions be- tion of a previous question, an answer to an earlier question, tween existing real-world relationships, and consistently pro- or a command to Aardvark? The details of how the Con- vide examples of how to behave, users in an online commu- versation Manager manages these complications and disam- nity will behave in a manner that is far more authentic and biguates user messages are not essential to the social search helpful than pseudonymous multicasting environments with model, so they are not elaborated here; but the basic ap- no moderators. The design of the Aardvark’s UI has been proach is to use a state machine to model the discourse carefully crafted around these principles. context. In all of the interfaces, wrappers around the messages from 4. EXAMPLES another user include information about the user that can In this section we take a qualitative look at user behavior facilitate trust: the user’s Real Name nametag, with their on Aardvark. Figure 5 shows a random sample of questions name, age, gender, and location; the social connection be- asked on Aardvark in this period. Figure 6 takes a closer tween you and the user (e.g., “Your friend on Facebook”, look at three questions sent to Aardvark during this period, “A friend of your friend Marshall Smith”, “You are both in all three of which were categorized by the Question Analyzer the Stanford group”, etc.); a selection of topics the user has under the primary topic “restaurants in San Francisco”.5 expertise in; and summary statistics of the user’s activity In Example 1, Aardvark opened 3 channels with candidate on Aardvark (e.g., number of questions recently asked or answerers, which yielded 1 answer. An interesting (and not answered). uncommon) aspect of this example is that the asker and the Finally, it is important throughout all of the above in- answerer in fact were already acquaintances, though only teractions that Aardvark maintains a tone of voice which listed as “friends-of-friends” in their online social graphs; is friendly, polite, and appreciative. A social search engine and they had a quick back-and-forth chat through Aardvark. depends upon the goodwill and interest of its users, so it is In Example 2, Aardvark opened 4 channels with candidate important to demonstrate the kind of (linguistic) behavior answerers, which yielded two answers. For this question, the that can encourage these sentiments, in order to set a good “referral” feature was useful: one of the candidate answerers example for users to adopt. Indeed, in user interviews, users was unable to answer, but referred the question along to a often express their desire to have examples of how to speak or behave socially when using Aardvark; since it is a novel 5 Names and affiliations have been changed to protect pri- paradigm, users do not immediately realize that they can vacy.
  • 8. websites & internet apps business research music, movies, TV, books sports & recreation home & cooking finance & investing technology & programming miscellaneous Aardvark local services Users travel product reviews & help restaurants & bars Figure 8: Categories of questions sent to Aardvark 4 x 10 2.5 Questions Answered 2 Figure 7: Aardvark user growth 1.5 1 friend to answer. The asker wanted more choices after the 0.5 first answer, and resubmitted the question to get another recommendation, which came from a user whose profile top- 0 0−3 min 3−6 min 6−12 min 12−30 min30min−1hr 1−4 hr 4+ hr ics related to business and finance (in addition to dining). In Example 3, Aardvark opened 10 channels with candi- date answerers, yielding 3 answers. The first answer came Figure 9: Distribution of questions and answering from someone with only a distant social connection to the times. asker; the second answer came from a coworker; and the third answer came from a friend-of-friend-of-friend. The October 2009. third answer, which is the most detailed, came from a user who has topics in his profile related to both “restaurants” Mobile users are particularly active Mobile users had and “dating”. an average of 3.6322 sessions per month, which is sur- One of the most interesting features of Aardvark is that prising on two levels. First, mobile users of Aardvark it allows askers to get answers that are hypercustomized to are more active than desktop users. (As a point of their information need. Very different restaurant recommen- comparison, on Google, desktop users are almost 3 dations are appropriate for a date with a spunky and spon- times as active as mobile users [10].) Second, mo- taneous young woman, a post-wedding small formal family bile users of Aardvark are almost as active in absolute gathering, and a Monday evening business meeting — and terms as mobile users of Google (who have on average human answerers are able to recognize these constraints. It 5.68 mobile sessions per month [10]). This is quite sur- is also interesting to note that in most of these examples (as prising for a service that has only been available for 6 in the majority of Aardvark questions), the asker took the months. time to thank the answerer for helping out. We believe this is for two reasons. First, browsing through traditional web search results on a phone is 5. ANALYSIS unwieldy. On a phone, it’s more useful to get a sin- The following statistics give a picture of the current usage gle short answer that’s crafted exactly to your query. and performance of Aardvark. Second, people are used to using natural language with Aardvark was first made available semi-publicly in a beta phones, and so Aardvark’s query model feels natural in release in March of 2009. From March 1, 2009 to October that context. These considerations (and early exper- 20, 2009, the number of users grew to 90,361, having asked iments) also suggest that Aardvark mobile users will a total of 225,047 questions and given 386,702 answers. All be similarly active with voice-based search. of the statistics below are taken from the last month of this Questions are highly contextualized As compared to web period (9/20/2009-10/20/2009). search, where the average query length is between 2.2 Aardvark is actively used As of October, 2009, 90,361 – 2.9 words [10, 14], with Aardvark, the average query users have created accounts on Aardvark, growing or- length is 18.6 words (median=13). While some of this ganically from 2,272 users since March 2009. In this increased length is due to the increased usage of func- period, 50,526 users (55.9% of the user base) generated tion words, 45.3% of these words are content words content on Aardvark (i.e., asked or answered a ques- that give context to the query. In other words, as tion), while 66,658 users (73.8% of the user base) pas- compared to traditional web search, Aardvark ques- sively engaged (i.e., either referred or tagged other peo- tions have 3–4 times as much context. ples questions). The average query volume was 3,167.2 The addition of context results in a greater diversity of questions per day in this period, and the median active queries. While in Web search, between 57 and 63% of user issued 3.1 queries per month. Figure 7 shows the queries are unique [14, 15], in Aardvark 98.1% of ques- number of users per month from early testing through tions are unique (and 98.2% of answers are unique).
  • 9. 4 100 x 10 3 80 2.5 60 % 2 40 Questions 20 1.5 0 1 >2 >4 >8 >16 >32 Number of Topics 0.5 Figure 11: Distribution of percentage of users and 0 number of topics 0 1 2 3 4 5+ Answers Received in-line feedback which askers provided on the answers Figure 10: Distribution of questions and number of they received rated the answers as ‘good’, 14.1% rated answers received. the answers as ‘OK’, and 15.5% rated the answers as ‘bad’. Questions often have a subjective element A manual There are a broad range of answerers 78,343 users (86.7% tally of 1000 random questions between March and Oc- of users) have been contacted by Aardvark with a re- tober of 2009 shows that 64.7% of queries have a sub- quest to answer a question, and of those, 70% have jective element to them (for example, “Do you know of asked to look at the question, and 38.0% have been any great delis in Baltimore, MD?” or “What are the able to answer. Additionally, 15,301 users (16.9% of things/crafts/toys your children have made that made all users) have contacted Aardvark of their own initia- them really proud of themselves?”). In particular, tive to try answering a question (see Section 3.6 for advice or recommendations queries regarding travel, an explanation of these two modes of answering ques- restaurants, and products are very popular. A large tions). Altogether, 45,160 users (50.0% of the total number of queries are locally oriented. About 10% user base) have answered a question; this is 75% of all of questions related to local services, and 13% dealt users who interacted with Aardvark at all in the pe- with restaurants and bars. Figure 8 shows the top riod (66,658 users). As a comparison, only 27% of Ya- categories of questions sent to Aardvark. The distri- hoo! Answers users have ever answered a question [7]. bution is not dissimilar to that found with traditional While a smaller portion of the Aardvark user base is web search engines [2], but with a much smaller pro- much more active in answering questions – approxi- portion of reference, factual, and navigational queries, mately 20% of the user base is responsible for 85% of and a much greater proportion of experience-oriented, the total number of answers delivered to date – the dis- recommendation, local, and advice queries. tribution of answers across the user base is far broader than on a typical user-generated-content site [7]. Questions get answered quickly 87.7% of questions sub- mitted to Aardvark received at least 1 answer, and Social Proximity Matters Of questions that were routed 57.2% received their first answer in less than 10 min- to somebody in the asker’s social network (most com- utes. On average, a question received 2.08 answers monly a friend of a friend), 76% of the inline feedback (Figure 10), 6 and the median answering time was 6 rated the answer as ‘good’, whereas for those answers minutes and 37 seconds (Figure 9). By contrast, on that came from outside the asker’s social network, 68% public question and answer forums such as Yahoo! An- of them were rated as ‘good’. swers [7] most questions are not answered within the People are indexable 97.7% of the user base has at least first 10 minutes, and for questions asked on Facebook, 3 topics in their profiles, and the median user has 9 only 15.7% of questions are answered within 15 min- topics in her profile. In sum, Aardvark users added utes [12]. (Of course, corpus-based search engines such 1,199,323 topics to their profiles; not counting over- as Google return results in milliseconds, but many of lapping topics, this yields a total of 174,605 distinct the types of questions that are asked from Aardvark topics which the current Aardvark user base has ex- require extensive browsing and query refinement when pertise in. The currently most popular topics in user asked on corpus-based search engines.) profiles in Aardvark are “music”, “movies”, “technol- Answers are high quality Aardvark answers are both com- ogy”, and “cooking”, but in general most topics are prehensive and concise. The median answer length as specific as “logo design” and “San Francisco pickup was 22.2 words; 22.9% of answers were over 50 words soccer”. (i.e., the length of a whole paragraph); and 9.1% of answers included hypertext links in them. 70.4% of 6. EVALUATION 6 To evaluate social search compared to web search, we ran A question may receive more than one answer when the Routing Policy allows Aardvark to contact more than one a side-by-side experiment with Google on a random sample candidate answerer in parallel for a given question, or when of Aardvark queries. We inserted a “Tip” into a random the asker resubmits their question to request a second opin- sample of active questions on Aardvark that read: ”Do you ion. want to help Aardvark run an experiment?” with a link to
  • 10. an instruction page that asked the user to reformulate their 8. REFERENCES question as a keyword query and search on Google. We [1] H. Bechar-Israeli. From <Bonehead> to asked the users to time how long it took to find a satisfac- <cLoNehEAd>: Nicknames, Play, and Identity on tory answer on both Aardvark and Google, and to rate the Internet Relay Chat. Journal of Computer-Mediated answers from both on a 1-5 scale. If it took longer than 10 Communication, 1995. minutes to find a satisfactory answer, we instructed the user [2] S. M. Bietzel, E. C. Jensen, A. Chowdhury, to give up. Of the 200 responders in the experiment set, we D. Grossman, and O. Frieder. Hourly analysis of a found that 71.5% of the queries were answered successfully very large topically categorized web query log. In on Aardvark, with a mean rating of 3.93 (σ = 1.23), while Proceedings of the 27th SIGIR, 2004. 70.5% of the queries were answered successfully on Google, [3] S. Brin and L. Page. The anatomy of a large-scale with a mean rating of 3.07 (σ = 1.46). The median time-to- hypertextual Web search engine. In Computer satisfactory-response for Aardvark was 5 minutes (of passive Networks and ISDN Systems, 1998. waiting), while the median time-to-satisfactory-response for [4] T. Condie, S. D. Kamvar, and H. Garcia-Molina. Google was 2 minutes (of active searching). Adaptive peer-to-peer topologies. In Proceedings of the Of course, since this evaluation involves reviewing ques- Fourth International Conference on Peer-to-Peer tions which users actually sent to Aardvark, we should ex- Computing, 2004. pect that Aardvark would perform well — after all, users chose these particular questions to send to Aardvark because [5] A. R. Dennis and S. T. Kinney. Testing media richness of their belief that it would be helpful in these cases. And theory in the new media: The effects of cues, in many cases, the users in the experiment noted that they feedback, and task equivocality. Information Systems sent their question to Aardvark specifically because a previ- Research, 1998. ous Google search on the topic was difficult to formulate or [6] J. Donath. Identity and deception in the virtual did not give a satisfactory result.7 community. Communities in Cyberspace, 1998. Thus we cannot conclude from this evaluation that social [7] Z. Gyongyi, G. Koutrika, J. Pedersen, and search will be equally successful for all kinds of questions. H. Garcia-Molina. Questioning Yahoo! Answers. In Further, we would assume that if the experiment were re- Proceedings of the First WWW Workshop on Question versed, and we used as our test set a random sample from Answering on the Web, 2008. Google’s query stream, the results of the experiment would [8] T. Hofmann. Probabilistic latent semantic indexing. be quite different. Indeed, for questions such as “What is In Proceedings of the 22nd SIGIR, 1999. the train schedule from Middletown, NJ to New York Penn [9] B. J. Jansen, A. Spink, and T. Sarcevic. Real life, real Station?”, traditional web search is a preferable option. users, and real needs: a study and analysis of user However, the questions asked of Aardvark do represent queries on the web. Information Processing and a large and important class of information need: they are Management, 2000. typical of the kind of subjective questions for which it is [10] M. Kamvar, M. Kellar, R. Patel, and Y. Xu. difficult for traditional web search engines to provide satis- Computers and iPhones and mobile phones, oh my!: a fying results. The questions include background details and logs-based comparison of search users on different elements of context that specify exactly what the asker is devices. In Proceedings of the 18th International looking for, and it is not obvious how to translate these infor- Conference on World Wide Web, 2009. mation needs into keyword searches. Further, there are not [11] D. Klein, K. Toutanova, H. T. Ilhan, S. D. Kamvar, always existing web pages that contain exactly the content and C. D. Manning. Combining heterogeneous that is being sought; and in any event, it is difficult for the classifiers for word-sense disambiguation. In asker to assess whether any content that is returned is trust- Proceedings of the SIGLEX/SENSEVAL Workshop on worthy or right for them. In these cases, askers are looking Word Sense Disambiguation, 2002. for personal opinions, suggestions, recommendations, or ad- [12] M. R. Morris, J. Teevan, and K. Panovich. What do vice, from someone they feel a connection with and trust. people ask their social networks, and why? a survey The desire to have a fellow human being understand what study of status message Q&A behavior. In Proceedings you are looking for and respond in a personalized manner in of CHI, 2010. real time is one of the main reasons why social search is an [13] L. Page, S. Brin, R. Motwani, and T. Winograd. The appealing mechanism for information retrieval. PageRank citation ranking: Bringing order to the Web. Stanford Digital Libraries Working Paper, 1998. 7. ACKNOWLEDGMENTS [14] C. Silverstein, M. Henziger, H. Marais, and M. Moricz. Analysis of a very large Web search engine We’d like to thank Max Ventilla and the entire Aardvark query log. In SIGIR Forum, 1999. team for their contributions to the work reported here. We’d especially like to thank Rob Spiro for his assistance on this [15] A. Spink, B. J. Jansen, D. Wolfram, and T. Saracevic. paper. And we are grateful to the authors of the original From e-sex to e-commerce: Web search changes. IEEE “Anatomy of a Large-Scale Hypertextual Web Search En- Computer, 2002. gine” [3] from which we derived the title and the inspiration [16] L. Sproull and S. Kiesler. Computers, networks, and for this paper. work. In Global Networks:Computers and International Communication. MIT Press, 1993. 7 [17] M. Wesch. An anthropological introduction to For example: “Which golf courses in the San Francisco Bay Area have the best drainage / are the most playable YouTube. Library of Congress, June, 2008. during the winter (especially with all of the rain we’ve been getting)?”