Search advertising is the only type of online advertising that consistently provides value to users. Swoop is a search advertising company that uses ElasticSearch at the core of its offering. This presentation is from a talk Swoop founder & CTO Sim Simeonov gave at the Boston ElasticSearch meetup.
11. Display Advertising
Search Advertising
High volume
Low quality
Does not optimize for users
Low engagement
16% of users click
1 in 1,200 ads clicked
Low volume
High quality
Optimizes for users
High engagement
80% of users click
1 in 40 ads clicked
23. What is a keyword?
A string
e.g., canon d70
A type: specifies when a keyword matches
e.g., positive phrase
9 types: each with own analysis pipeline
Inherited filtering criteria
e.g., US-only traffic
also negative keywords
25. Keyword doc schema
Many possible schema
Query dependent
One type vs. many types
Query depends on matching model
26. Matching models
Two main approaches
Boolean matching
IR matching
No time to discuss this
Gets very geeky/math-y very quickly
27. Boolean Query Pattern
for all keyword document fields i, AND together
(
“does not have field i” OR
(
“has field i” AND
“field i satisfies the user query”
)
)
28. Keyword ranking
Generalized second-price auctions with
revenue ordering, minimum prices and
user value feedback, tuned for locally
envy-free equilibria
P.S. Tends to work best when the moon is full
29. Search relevance is not enough
"Terrorism: Pursue a certificate in terrorism 100% online.
Enroll today. Ads by Google.”
30. Custom ranking algorithm
Balance expected “value” trade-offs
User: engagement w/o WTF moments
Advertiser: performance
Publisher/network: revenue
Need external data
CTRs, bounce rates, share of budget, …
Frequent updates to this data
31. Problem
Lucene not suited for external data access
Expensive to add data to indexes
update == delete + add
33. General map/reduce with ES
elasticsearch-facet-script
on each shard node
init_script: run once
map_script: run per result
combine_script: run w/ shard results
on the aggregation node
reduce_script: sees all results
38. Build a “query” from the page
Same two models as before
Phrase extraction (boolean)
IR matching
Common tools
Text analysis/summarization
Language modeling
Often involves indexing the pages
39. There is a catch
AdWords on GDN performs
3-10x worse than AdWords on SERP
43. Swoop solves these problems
Unique real-time extraction & placement
browser/app, Web/mobile
100+ patent claims
A single page can generate 50+ queries
Pixel-perfect placement in content
If there is nothing to say we say nothing
44. Some metrics
3 x 3 x 3 ES deployment
data, master, client nodes
5,000+ rps
< 5ms query execution time
ElasticSearch, Lucene & Redis are fast!
45. Rewards for solving problems
A big sense of accomplishment
Business doubling Q-Q
Users getting better content
Bigger, harder, more important problems
46. Swoop’s future with ES
Deeper into Lucene
More machine learning in ES map/reduce
Better query rewriting engine
Better content enhancement engine
Probabilistic synchronized sharding
Much bigger clusters