Azzopardi2012economics of iir_tech_talk

The Economics in
Interactive Information Retrieval

Leif Azzopardi
http://www.dcs.gla.ac.uk/~leif

Interaction
Cost
Benefit

Interactive and Iterative Search
A simplified, abstracted, representation

Information
Need

Documents
Returned
User

System

Queries
Relevant
Information

Observational & Empirical

Berry
Picking IS&R
ASK Framework
Information
Foraging
Theoretical & Formal Theory

Pirolli (1999)

Theoretical & Formal
A Major Research Challenge
Interactive Information Retrieval needs formal models to:
• describe, explain and predict the interaction of users
with systems,
• provide a basis on which to reason about interaction,
• understand the relationships between interaction,
performance and cost,
• help guide the design, development and research of
information systems, and
Belkin (2008)
• derive laws and principles of interaction.
Jarvelin (2011)

User queries tend to be
short (only 2-3 terms) Web searchers typically
only examine the first
Users will often pose a page of results
series of short queries

WhyHowusers behave like this?
do do users behave?
Patent searchers typically Users adapt to degraded
examine 100-200 documents per systems by issuing more
query (using a Boolean system) queries

Patent searchers usually express Users rarely provide
longer and complex queries explicit relevance feedback

So why do users pose short queries?

User queries tend to be short

But longer queries tend to be more effective!

So why do users pose short queries?

0.5
Exponentially diminishing
0.45
returns kicks in after 2
0.4
query terms
0.35
Performance

0.3
Total Performance
0.25
0.2
0.15 Around 2-3 terms is where
0.1 Marginal Performance the user gets the most bang
0.05 for their buck
0
0 10 20 30

Query Length (No. of Terms)
Azzopardi (2009)

How can we use microeconomics to
model the search process?

Microeconomics

Production Theory

Consumer Theory

Utility Maximization

Cost Minimization

Production Theory
a.k.a. Theory of Firms

Inputs Output

The Firm
Capital Widgets
Labor
Utilizes Constrains

Varian (1987)
Technology

Production Functions

Production Function
Capital

Labor


Production Function Quantity = F ( Capital, Labor )
Capital

Quantity 3

Quantity 2

Quantity 1
Labor


Production Set
Production Function
Capital

Quantity 3

Quantity 2

Quantity 1
Labor


Production Set
Production Function
Capital

Quantity 3

Quantity 2
Technology constrains
the production set Quantity 1
Labor

Applying Production Theory to
Interactive Information Retrieval

Search as Production

Inputs Output

The Firm
Queries Relevance
Assessments Gain
Utilizes Constrains

Search Engine Technology

Search Production Function

The function represents how
well a system could be used.
No. of Queries (Q)

i.e. the min input required to
achieve that level of gain

Gain = 30

Gain = F(Q,A) Gain = 20

Gain = 10
No. of Assessments per Query (A)

What strategies can the user employ
when interacting with the search system to achieve their end goal

Lots of
Few Queries, Queries,
Lots of Few
Assessments? Assessments
?

Or some
other way?

What is the most cost-efficient way for
a user to interact with an IR system?

Modeling Caveats
of an economic model of the search process

Abstracted
Simplified

Representative

What does the model
tell us about search & interaction?

Search Scenario
Scenario
• Task: Find news articles about ….
• Goal: To find a number of relevant documents and reach
the desired level of Cumulative Gain.
• Output: Total Cumulative Gain (G) across the session
• Inputs:
– Y No. of Queries, and
– X No. of Assessments per Query
• Collections:
– TREC News Collections (AP, LA, Aquaint)
– Each topic had about 30 or more relevant documents
• Simulation: built using C++ and the Lemur IR toolkit

Simulating User Interaction
Models:
Probabilistic
TREC Vector Space
Aquaint Boolean
Topics

Assesses
Record X & Y X Documents
for each Simulated User per Query
The simulation assumes the user
level of gain has perfect information –
in order to find out how well the system could be used.
Select the best
query
first/next

Queries generated Issues Y Queries
TREC Documents from Relevant set of Length 3
marked Relevant

Search Production Curves
Same Retrieval Model, Different Gain
TREC Aquaint Collection
20
To double the gain, requires
18 BM25 NCG=0.2
more than double the no. of
16
assessments BM25 NCG=0.4
14
No. of Queries

12 8 Q & 15 Q/A gets NCG = 0.4
10 4 Q & 40 Q/A gets NCG = 0.4

8
7.7 Q & 5 Q/A gets NCG = 0.2
6 3.6 Q & 15 Q/A gets NCG = 0.2
4
2
0
0 50 100 150 200 250 300
No. of Assessments per Query

Search Production Curves
Different Retrieval Models, Same Gain
TREC Aquaint Collection
20
No input combinations BM25 NCG=0.4
18
with depth less than this BOOL NCG=0.4
16 are technically feasible! TFIDF NCG=0.4
14
No. of Queries

For the same gain, BOOL
12
and TFIDF require a lot
10 more interaction.
8
6
BM25 provides more
4 strategies (i.e. input
User Adaption: combinations) than
2 BOOL or TFIDF
-BM25: 5 Q @ 25 A/Q
0
-BOOL: 10 Q @ 25A/Q
More queries on the 50
0 100 150 200 250 300
degraded systems No. of Assessments per Query

Cobbs-Douglas Production Function

No. of Assessments per query Mixing parameter determined by the technology

a (1-a )
f (Q, A) = K.Q .A
No. of queries issued Efficiency of the technology used

Model K α Goodness of Fit
BM25 5.39 0.58 0.995
BOOL 3.47 0.58 0.992
TFIDF 1.69 0.50 0.997
Example Values on Aquaint when NCG = 0.6

Using the Cobbs-Douglas Search Function
We can differentiate the function to find the rates of change of the input variables

¶f (Q, A)
Marginal Product of Querying ¶Q
– the change in gain over the change in querying
– i.e. how much more gain do we get if we pose
extra queries
¶f (Q, A)
Marginal Product of Assessing ¶A
– the change in gain over the change in assessing
– i.e. how much more gain do we get if we assess
extra documents

Technical Rate of Substitution
How many more assessments per query are needed, if one less query was posed?

TRS of Assessments for Queries
¶A
20
TRS(A,Q) =
18 0.4 BM25 NCG=0.4
At this point if you gave up ¶Q
16 one query you’d need to
No. of Queries

14 1.2 assess 1.2 extra docs/query
12
2.5 EXAMPLE:
10
If 5 queries are
8
4.2 submitted, instead of 6, then
6 24.2 docs/query need to be
8.3
4 assessed, instead of 20
2
docs/query
0 6Q @ 20A / Q = 120 A
0 100 200 5Q300 24.2 / Q = 121 A
@
No. of Assessments per Query

What about the cost of
interaction?

User Search Cost Function
A linear cost function

No. of Assessments per query Total no. of documents assessed

c(Q, A) = b.Q +Q.A
No. of queries issued Relative cost of a Query to an Assessment

What is the relative cost of a query?
Using cognitive costs of querying and assessing taken
from Gwizdka (2010):
• The average cost of querying was 2628 ms
• The average cost of assessing was 2226 ms
• So β was set to 2628/2226 = 1.1598

Cost Efficient Strategies
BM25 0.4 and 0.6 Gains
50
No. of Queries

40 On BM25 to increase
30 gain pose more
20 queries, but examine
10 BM25@0.6 the same no. of docs
0
BM25@0.4 per query
0 10 20 30

380
330
Cost

280
230 Minimum Cost
180
130
0 10 20 30

No. of Assessment per Query

Cost Efficient Strategies
BOOL 0.4 & 0.6 Gains
On Boolean, to 12

No. of Queries
increase gain, 10
8
issue the about the 6 BOOL@0.6
same no. of queries, 4
but examine more 2 BOOL@0.4
docs per query 0
0 100 200
1500
1300
1100
Cost 900
700
Minimum Cost 500
300
0 100 200

Contrasting Systems
BM25 0.4 and 0.6 Gains BOOL 0.4 and 0.6 Gains
50 12
40 10

No. of Queries
No. of Queries

8
30 On BM25 issue more queries
6 BOOL@0.6
20 4
10 2 BOOL@0.4
0 But examine less doc per query
0
0 10 20 30 0 100 200

380 1500
330 1300
1100
Cost

Cost
280
900
230 700
180 BM25 is less costly to use than
500 BOOL
130 300
0 10 20 30 0 100 200
No. of Assessment per Query No. of Assessment per Query

A Hypothetical Experiment
What happens if

Querying More Decrease in
costs queries assessments
go down? issued per query

$$$$

Querying Decrease in Increase in
costs queries assessments
go up? issued per query

Changing the Relative Query Cost

c(Q, A) = b.Q +Q.A

As β increases the
relative cost of
Cost

querying goes up,
it is cheaper to assess
more documents per
query and
consequently query
less!


Implications for Design
• Knowing how benefit, interaction and cost
relate can help guide how we design systems
– We can theorize about how changes to the
system will affect the user’s interaction
• Is this desirable? Do we want the user to query more?
Or for them to assess more?
– We can categorize the type of user
• Is this a savvy rational user? Or is this a user behaving
irrationally?
– We can scrutinize the introduce of new features
• Are they going to be of any use? Are they worth it for
the user? i.e. how much more performance, or how
little must they cost?

Future Directions
Future Directions
• Validate the theory by conducting
observational & empirical research
– Do the predictions about user behavior hold?
• Incorporate other inputs into the model
– Find Similar, Relevance Feedback, Browsing,
– Query length, Query Type, etc
• Develop more accurate cost functions
– Obtain Better Estimates of Costs
• Model other search tasks

Questions

Contact Details

Email: Leifos@acm.org

Skype: Leifos

Twitter: @leifos

Selected References
• Varian, H., Intermediate Microeconomics, 1987
• Varian, H., Economics and Search, ACM SIGIR Forum, 1999
• Pirolli, P., Information Foraging Theory, 1999
• Belkin, N., Some (what) grand challenges of Interactive
Information Retrieval, ACM SIGIR Forum, 2008
• Azzopardi, L., Query Side Evaluation, ACM SIGIR 2009
– http://dl.acm.org/citation.cfm?doid=1571941.1572037

• Azzopardi, L., The Economics of Interactive Information
Retrieval, ACM SIGIR 2011
– http://dl.acm.org/citation.cfm?doid=2009916.2009923

• Jarvelin, K., IR Research:
Systems, Interaction, Evaluation and Theories, ACM
SIGIR Forum, 2011

Example

G = F( X, Y )
Interaction X

Interaction Y

Example application for web search

P@10 = F(L,A)
Length of Query (L)

P@10= 0.3

P@10= 0.2

P@10= 0.1
No. of Assessments (A)

Azzopardi2012economics of iir_tech_talk

Recommended

Recommended

More Related Content

Similar to Azzopardi2012economics of iir_tech_talk

Similar to Azzopardi2012economics of iir_tech_talk (20)

Recently uploaded

Recently uploaded (20)

Azzopardi2012economics of iir_tech_talk

Editor's Notes