This document discusses search architecture and optimization for e-commerce platforms. It describes how search is a critical feature that powers recommendations and sales. Key challenges include large catalogs that change frequently, diverse user needs like geo-specific ranking, and balancing multiple objectives. The document outlines the technical infrastructure supporting search, including serving architecture, indexing workflows, and approaches to improve quality like query understanding and personalization.
Keeping your build tool updated in a multi repository world
E-commerce Search Platform Architecture and Quality
1.
2.
3. ● Search is one of the most
important discovery tools in
E-commerce.
● Powers other features like
merchandising (promotions),
recommendations etc.
● Accounts for big fraction of the
units sold and GMV.
4. ● Important signals that
affect search: Price,
offers, popularity,
availability, serviceability
etc.
● Used in ranking of
products.
● Exposed as filters and
sorts to end users.
● These signals are very
dynamic, particularly
during sales.
5. ● E-commerce search != websearch.
● Documents have a structure to them
● Queries have an implicit structure
● Challenges:
○ Large document collection with a long heavy tail
○ Extremely high rate of changes/updates (Thousands per sec)
○ Geo specific ranking
○ Multi-objective optimization (GMV, Units, Ads revenue, Long
Term Value)
● Opportunities:
○ Broad queries: personalization can play a huge role
6. ● Queries per day: XXX Millions / week
● Latencies:
○ Average: ~ 100 ms
○ Median: ~ 50 ms
○ 90th percentile: ~ 500 ms
● Documents retrieved and scored from index:
○ Median: 1K to 10K
○ 95th percentile: 200K to 500K
○ 99th percentile: 500K to 3M+
● Search CTR: Around 50%
7. ● Architectural overview of the search platform
○ Serving and Ingestion
○ Serving functional view
○ Serving architectural view
○ Ingestion architectural view
○ Example ingestion topology
● Search quality
○ Challenges
○ Life of a query: Typical flow for query understanding
○ Illustrative problems
14. ● Architectural overview of the search platform
○ Serving and Ingestion
○ Serving functional view
○ Serving architectural view
○ Ingestion architectural view
○ Example ingestion topology
● Search quality
○ Challenges
○ Life of a query: Typical flow for query understanding
○ Illustrative problems
15. ● Marketplace
○ Catalog entries vary in quality from seller to seller. Spam is
rampant.
● Diversity of users
● Mobile heavy users: Real estate on UI
● Poor internet connectivity
16. ● Literacy/Internet awareness
● Language
● Economic power
● Regional preferences
Abstraction: City-tier
Query/Intent Solicitation
Result Presentation
Product Ranking
17. 40% increase in proportion of tier-3 customers vis-a-vis metro
18. Query: samsang
Relative ratio of query Tier-3 Vs Metro: 1.8
Query: jins
Relative ratio of query Tier-3 Vs Metro: 2.2
19.
20. Query
Scoring
Normalisation(Index time as well)
- String clean-up
- lower
Spell Correction
- Resource-based
- term->term
- Query->query
- Online
Init
Context
Phrasing (Index time as well)
- Frequent bi/tri grams
Stemming (Index time as well)
- Core e-commerce
stemmer
- plurals
Common MetaData Store (Query Level)
- Raw Data: metrics (CTR, Impression, NDCG…)
- Derived Data: Store, LM score, Features
Synonyms
- Resource-based
Intent
- Deductions
- Tagging (CRF)
Query Rewrite
- Best query selection
- Partial match
SOLR interface
Query Understanding
Output Generator
Retrieval
ranking
logic
Store Classifier
Query LM
Feature Store
Classification
21. • Special patterns:
– Segmented words: lgnexus5
Counting: “samsang” & no-click followed
by “samsung”& click a million times
– Context aware counting
• Language modeling and edit distance
• Term to vector models in deep learning.
Specific
General
22. ● Intent: From query tokens to (implicit) attributes that are
represented by those tokens
● Examples:
○ “red tape shoes” -> (brand) “red tape” (store) “shoes”
○ “kids party dress 4-5 years pack of 2” -> (ideal_for) “kids”
(occasion) “party” (store) “dress” (size) “4-5 years”
(pack_of) “pack of 2”
○ “samsung e6 cases” -> (“compatible_with”) “samsung e6”
(store) “cases”
● Memorization, Language modeling, CRF
23. Past orders Product Views
Users’ activity on the platform
Customised Search Ranking
for User-segment
24. economical expensive
shoes
watches
Past orders Product Views
5 price ranges defined for each
vertical.
1 2 3 4 5
User-Segments based on price affinities
Users’ past activity on the platform.
Customised Search Ranking
for each User-segment
Price
Personalization
#ofusers