SlideShare uma empresa Scribd logo
1 de 65
One Catalog Service 
to rule them all 
Antoine Girbal 
Principal Solutions Engineer, MongoDB Inc. 
@antoinegirbal
Problem Statement
The many catalogs problem 
3
The many catalogs problem 
1. One department in charge of master product works hard at fitting 
4 
data into SQL tables 
2. Resulting data sits in a SQL server with a couple replicas. It's 
forbidden to hit it more than 100 times / sec 
3. Other departments need to access the data way more often for 
their own services 
4. Other departments need more information that is not available 
since it did not fit in that long devised rigid SQL schema 
5. ETLs and Message Buses are put in place for other teams to try 
figure it out themselves… 
6. Data becomes inconsistent, fragmented, not up-to-date… 
Problem visible both internally and by customers!
Search – Using Solr 
5 
How many Catalogs and 
Catalog Caches do you have?
The many catalogs problem 
6 
Online Store 
Catalog 
Marketing 
Catalog 
Dozens of catalogs! 
Department 3 
Catalog 
Product Department 
Master 
Catalog 
Department 4 
Catalog 
Department 5 
Catalog 
Department 1 
Catalog 
Message 
Bus 
ETLs
Goal: Single View of Product 
• Single view of a product, one central catalog 
7 
service 
• Flexible schema containing all useful data 
• Read volume high and sustained, 100k reads / s 
• Can seamlessly take write spikes during catalog 
update 
• Advanced indexing and querying 
• Geographical distribution for HA and low latency
Agenda 
1. MongoDB Overview 
2. Catalog Service Architecture 
3. Data Store Models 
4. Product Search 
8
MongoDB Overview
MongoDB is a great fit 
• Holds complex JSON structures 
• Dynamic Schema for Agility 
• complex querying and in-place updating 
• Secondary, compound and geo indexing 
• full consistency, durability, atomic operations 
• HA and geo-distributed via Replication 
• Near linear scaling via Sharding 
• Overall, MongoDB is a unique fit! 
10
MongoDB Strategic Advantages 
11 
Application 
Horizontally Scalable 
-Sharding 
Agile 
Flexible 
High Performance & 
Strong Consistency 
Highly 
Available 
-Replica Sets 
{ customer: “roger”, 
date: new Date(), 
comment: “Spirited Away”, 
tags: [“Tezuka”, “Manga”]}
build your data to fit your application 
Relational MongoDB 
12 
{ customer_id : 1, 
name : "Mark Smith", 
city : "San Francisco", 
orders: [ { 
order_number : 13, 
store_id : 10, 
date: “2014-01-03”, 
products: [ 
{SKU: 24578234, 
Qty: 3, 
Unit_price: 350}, 
{SKU: 98762345, 
Qty: 1, 
Unit_Price: 110} 
] 
}, 
{ <...> } 
] 
} 
CustomerID First Name Last Name City 
0 John Doe New York 
1 Mark Smith San Francisco 
2 Jay Black Newark 
3 Meagan White London 
4 Edward Danields Boston 
Order Number Store ID Product Customer ID 
10 100 Tablet 0 
11 101 Smartphone 0 
12 101 Dishwasher 0 
13 200 Sofa 1 
14 200 Coffee table 1 
15 201 Suit 2
Notions 
13 
RDBMS MongoDB 
Database Database 
Table Collection 
Row Document 
Column Field
Catalog Service Architecture
Architecture Overview 
15 
Information 
Management 
Merchandising 
Content 
Inventory 
Customer 
Channel 
Sales & 
Fulfillment 
Insight 
Social 
Customer 
Channels 
Amazon 
Ebay 
… 
Stores 
POS 
Kiosk 
… 
Mobile 
Smartphone 
Tablet 
Website 
Contact 
Center 
Social 
Facebook 
Twitter 
… 
Application 
Servers 
API 
Data and 
Service 
Integration 
Suppliers 
Supply Chain 
Management 
System 
Data 
Warehouse 
Analytics 
3rd Party 
In Network 
Web 
Servers
Commerce Functional Components 
16 
Customer Enterprise 
Information 
Layer 
Look & Feel 
Navigation 
Customization 
Personalization 
Branding 
Promotions 
Chat 
Ads 
Customer's 
Perspective 
Research 
Browse 
Search 
Select 
Shopping Cart 
Purchase 
Checkout 
Receive 
Track 
Use 
Feedback 
Maintain 
Dialog 
Assist 
Market / Offer 
Guide 
Offer 
Semantic 
Search 
Recommend 
Rule-based 
Decisions 
Pricing 
Coupons 
Sell / Fullfill 
Orders 
Payments 
Fraud 
Detection 
Fulfillment 
Business Rules 
Insight 
Session 
Capture 
Activity 
Monitoring 
Information 
Management 
Merchandising 
Content 
Inventory 
Customer 
Channel 
Sales & 
Fulfillment 
Insight 
Social
Merchandising Components 
Merchandising 
17 
MongoDB 
Item 
Variant 
Hierarchy 
Localization 
Pricing 
Promotions 
Ratings & Reviews 
Calendar 
Semantic Search
Merchandising - Architecture 
19 
MongoDB Data Store 
Items Pricing Promotions 
Variants 
Ratings & 
Reviews 
Search Engine 
… 
Product Service API 
Online Store Marketing Inventory SCMS Public API …
Data Store Models
Models - Product Page 
21 
Product 
images 
General 
Informatio 
n 
List of 
Variants 
External 
Informatio 
n 
Localized 
Description
Models - Overview 
• Item: the overall product info (e.g. Levi’s 501) 
• Variant: a specific variant of an item (e.g. in black size 6) 
22 
which typically has a specific SKU / UPC 
• Price: price information may vary based on the store, the 
variant, etc 
• Hierarchy: the item taxonomy 
• Facet: facets to search products by 
• Vendors: a given sku may be available through several 
vendors if the site is a marketplace 
> Don't try to fit all in the same document!
23 
One Item 
Hundreds 
of sizes 
Dozens of 
colors 
Models – Overview
Models - Overview 
• A single item may have thousands of variants 
• Each variant can have hundreds of attributes 
• Altogether a single item can represent many MBs 
24 
worth of JSON text 
• Don't try to fit everything into the same 
document! 
• Use a schema that is natural and fits the API
Models - Item Model 
{ "_id": "054VA72303012P", // the item id 
25 
"desc": [ // item descriptions 
{ "lang": "en", "val": "Give your dressy look a lift with ..." }, ... 
], 
"name": "Women's Kate Ivory Peep-Toe Stiletto Heel", 
"category": "/84700/80009/1282094266/1200003270", // hierarchy 
"brand": { "id": "2483510", "img": "http://...", "name": "Metaphor" }, 
"assets": { // references to all assets 
"imgs": [ 
{ "img": { "width": 1900, "height": 1900, "src": "http://..." }, ... 
] 
}, 
"shipping": { // shipping specs }, "specs": { // item specs }, 
"attrs": [ // list of items attributes (facets) 
{ "name": "Heel Height", "value": "High (2-1/2 to 4 in.)" }, 
{ "name": "Toe", "value": "Open toe" }, ... 
], 
"variants": { // quick info on the variants 
"cnt": 9, 
"attrs": [ 
{ "dispType": "DROPDOWN", "name": "Color" }, 
{ "dispType": "DROPDOWN", "name": "Shoe Size" }, ... 
] 
}, 
"lastUpdated": 1400877254787 // keep track of updates }
Models - Item Model 
• Get item by id 
26 
db.definition.findOne( { _id: "301671" } ) 
• Get items from list of ids 
db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } ) 
• Get items by department 
db.definition.find({ category: { $regex: "^/84700/" } }) 
• Get items by category prefix 
db.definition.find( { category: { $regex: "^/84700/80009/" } } ) 
• Secondary Indices 
name, category, lastUpdated
Models – Variant Model 
{ "_id": "05458452563", // the sku 
27 
"name": "Width:Medium,Color:Ivory,Shoe Size:6.5", 
"itemId": "054VA72303012P", // reference to the item id 
"altIds": { "upc": "632576103580" }, 
"assets": { // list of assets specific to variant 
"imgs": [ 
{ "width": 1900, "height": 1900, "src": "http://..." }, 
{ "width": 1900, "height": 1900, "src": "http://..." }, ... 
] 
}, 
"attrs": [ // list of attributes specific to variant 
{ "name": "Width", "value": "Medium" }, 
{ "name": "Color", "family": "White", "value": "Ivory" }, 
{ "name": "Size", "value": "6.5" }, ... 
], 
"lastUpdated": 1400877254787 // keep track of updates }
Models – Variant Model 
• Get variant from SKU 
28 
db.variant.find( { _id: "05458452563" } ) 
• Get all variants for a product, sorted by SKU 
db.variant.find( { itemId: "054VA72303012P" } ).sort( { _id: 1 } ) 
• Indices 
itemId, lastUpdated
Models - Hierarchy 
29 
{ 
"_id": "1200003270", // the node id 
"name": "Women's Heels & Pumps", 
"count": 22305, // how many items in this category 
"parents": [ // list of parents 
"1282094266" 
], 
"facets": [ // facets that exists for this category 
"Heel Height", 
"Toe", 
"Upper Material", 
"Width", 
"Shoe Size", 
"Color" 
] 
}
Models – Hierarchy 
• Get hierarchy node by id 
30 
db.hierarchy.find( { _id: "1200003270" } ) 
• Get hierarchy node from parent id 
db.hierarchy.find( { parents: "1282094266" } ) 
• Get departments (no parent) 
db.hierarchy.find( { parents: null } ) 
• Secondary Indices 
parents
Models – per Store Pricing 
Per store pricing could result in billions of 
documents…unless it is built in a modular way: 
_id: concatenation of item and store. 
Item: can be an item id or variant id (sku) 
Store: can be a store group (online) or store id. 
31 
{ "_id": "skuSPM8824542513_1234/store123", 
"price": 69.99, 
"sale": { 
"salePrice": 42.72, 
"saleEndDate": "2050-12-31 23:59:59" 
}, 
"lastUpdated": 1374647707394 }
Models – per store Pricing 
• Get all prices for a given item 
32 
db.prices.find( { _id: /^item301671/ ) 
• Get all prices for a given sku (price could be at item level) 
db.prices.find( { _id: { $in: [ /^sku730223104376/, /^item301671/ ]) 
• Get minimum and maximum prices for a sku 
db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price }, 
max: { $max : price} } }) 
• Get price for a sku and store id (returns up to 4 prices) 
db.prices.find( { _id: { $in: [ "sku730223104376/store1234", 
"sku730223104376/sgroup0", 
"item301671/store1234", 
"item301671/sgroup0"] , { price: 1 })
Product Search
Search – Browse and Search products 
Browse by 
category 
34 
Special 
Lists 
Filter by 
attributes 
Lists hundreds 
of item 
summaries 
By far the toughest page to get right and fast …
Search – Browse and Search products 
The previous page presents many challenges: 
• Response within milliseconds for hundreds of items 
• Faceted search on many attributes: category, brand, … 
• Efficient sorting on several attributes: price, popularity 
• Pagination feature which requires deterministic ordering 
> Search engines are built for this purpose! 
35
Search – Traditional Architecture 
36 
Product Data Store Product Search 
Indexing 
#1 obtain 
search 
results IDs 
#2 obtain objects by 
ID from cache or DB 
Cache Application 
Pre-joined 
into objects
Search – Traditional Architecture 
The traditional architecture issues: 
• 3 different systems to maintain: RDBMS, 
37 
Search engine, Caching layer 
• RDBMS schema is complex and static 
• Applications needs to talk many languages
Search – Architecture with MongoDB 
38 
Product Data Store Product Search 
Indexing 
#1 obtain 
search 
results IDs 
Applications 
#2 obtain 
objects by 
list of IDs 
MongoDB 
Ready-to-use 
product 
documents 
Search Engine 
Product API 
Application 
issues single 
query
Search - Mongo-Connector 
39 
MongoDB 
Search 
Engine 
Oplog 
Mongo 
Connector 
#1 Initial dump 
of the 
collections 
#2 Updates 
streaming via 
Oplog 
Translatio 
n, filtering 
Indexing 
Indexing
Search - Mongo-Connector 
• Open-source Project at 
40 
https://github.com/10gen-labs/mongo-connector 
• Python app that reads from MongoDB's oplog 
and publishes to target of choice 
• Supports initial sync by dumping the data 
• Default connectors for Solr, Elastic Search, 
other MongoDB cluster 
• Easily extensible to update other systems like 
SQL
Search – Mongo-Connector 
41 
What is the data to index?
Search – More Searching 
42 
Images of the matching 
variants are displayed 
Price and 
Rating 
Facets for 
variants
Search – More Searching 
… more challenges: 
• Attributes at the variant level: color, size, etc 
• Attributes from other docs: pricing, ratings, etc 
• Display the matching variant's image and details 
• Thousands of matching variants for an item, still 
43 
need to display a single item 
• Challenge to properly index the data 
> Need for a single summary document per item
Search - Architecture 
44 
MongoDB Data Store 
Items Summaries Pricing 
Ratings & 
Reviews 
Variants Promotions
Search – Summary Model 
{ "_id": "3ZZVA46759401P", // the item id 
45 
"name": "Women's Chic - Black Velvet Suede", 
"dep": "84700", // useful as standalone for indexing 
"cat": "/84700/80009/1282094266/1200003270", 
"desc": { "lang": "en", "val": "This pointy toe slingback ..." }, 
"img": { "width": 450, "height": 330, "src": "http://..." }, 
"attrs": [ // global attributes, easily indexable by SE 
"heel height=mid (1-3/4 to 2-1/4 in.)", 
"brand=metaphor", 
"shoe size=6", 
"shoe size=6.5", ... 
], 
"sattrs": [ // global attributes, not to be indexed 
"upper material=synthetic", 
"toe=open toe", ... 
], 
"vars": [ 
{ "id": "05497884001", 
"img": [ // images], 
"attrs": [ // list of variant attributes to index ] 
"sattrs": [ // list of variant attributes not to index ] }, … 
] }
Search – Using Solr 
46 
Let's use Solr …
Search - Using Solr 
47
Search - Using Solr 
Defining the schema in schema.xml 
<fields> 
<!-- some of the core fields --> 
<field name="_id" type="string" indexed="true" stored="true" /> 
<field name="name" type="text_general" indexed="true" stored="true" /> 
<field name="cat" type="string" indexed="true" stored="true" /> 
<field name="price" type="float" indexed="true" stored="true"/> 
<!-- the full text to index --> 
<field name="desc.0.val" type="text_general" indexed="true" stored="true"/> 
<!-- dynamic attributes for facetting --> 
<dynamicField name="attrs.*" type="string" indexed="true" stored="true"/> 
<!– some Solr specific fields --> 
<field name="_version_" type="long" indexed="true" stored="true"/> 
<field name="timestamp" type="date" indexed="true" stored="true" default="NOW" 
multiValued="false"/> 
<dynamicField name="*" type="ignored" multiValued="true"/> 
</fields> 
48
Search - Using Solr 
Starting up the connector 
> mongo-connector 
> Keep it running, it will just stream the Oplog 
49 
-m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo 
-t http://localhost:8983/solr // the solr 
-d mongo_connector/doc_managers/solr_doc_manager.py 
-n "catalog.summary" // target summary collection 
--auto-commit-interval=60 // commit every 1 min 
…
Search – Using Solr 
Document in Solr looks like: 
{ "desc.0.val": "Our classic "Flying Duck" styled as a ...", 
Lists are flattened which is difficult to use  
> Must use to named fields to implement Facets 
50 
"name": "Drake Waterfowl Duck Label SS T-Shirt Army Green", 
"attrs.1": "brand=Drake Waterfowl", 
"attrs.0": "style=t-shirts", 
"cat": "/84700/1200000239/1282094207/1200000817", 
"_id": "SPM10823491916", 
"_version_": 1479173524477182000, 
"timestamp": "2014-09-13T23:09:59.782Z" 
}
Search – Using Elastic Search 
51 
Let's use Elastic Search…
Search - Using Elastic Search 
52
Search - Using Elastic Search 
ElasticSearch understands whole document right off the bat  
Just need to tell ES not to tokenize the facets: 
> Everything else is indexed auto-magically! 
53 
$ curl -XPOST localhost:9200/largecat3.summary -d '{ 
"settings" : { 
"number_of_shards" : 1 
}, 
"mappings" : { 
"string" : { // string is the name of default mapping type 
"properties" : { 
"attrs" : { "type" : "string", "index" : "not_analyzed" } 
} 
} } }'
Search - Using Elastic Search 
Starting up the connector 
> mongo-connector 
> Keep it running, it will just stream the Oplog 
54 
-m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo 
-t http://localhost:9200 // the ES 
-d mongo_connector/doc_managers/elastic_doc_manager.py 
-n "catalog.summary" // target summary collection 
--auto-commit-interval=60 // commit every 1 min 
…
Search - Using Elastic Search 
Querying for documents, with Facet info… works well  
$ curl -X POST "http://localhost:9200/largecat3.summary/_search?pretty=true" -d ' 
{ "query" : { "query_string" : {"query" : "Ipad"} }, 
55 
"facets" : { "tags" : { "terms" : {"field" : "attrs"} } } }' 
{ "took" : 6, 
"hits" : { 
"total" : 151, 
"max_score" : 0.5892989, 
"hits" : [ { 
"_index" : "largecat3.summary", 
"_type" : "string", 
"_id" : "000000000000000012730000000000QAU-QR2442P", 
"_score" : 0.5892989, 
"_source": { // original JSON from MongoDB }, ... ] 
}, 
"facets" : { 
"tags" : { 
"_type" : "terms", 
"total" : 1577, 
"terms" : [ { "term" : "ring size=9", "count" : 120 }, 
{ "term" : "ring size=8", "count" : 120 }, 
{ "term" : "metal=sterling silver", "count" : 112 }, ... ] 
} } }
Search – Using MongoDB Indexing 
56 
How about MongoDB's indexes and 
Full-Text-Search?
Search – Using MongoDB indexing 
The summary contains: 
• department e.g. "Shoes" 
• Fields to index 
57 
– Category path, e.g. "Shoes/Women/Pumps" 
– Price 
– List of Item Attributes, e.g. Brand = Guess 
– List of Variant Attributes, e.g. Color = red 
• Fields not to index 
– List of Item Secondary Attributes, e.g. Style = Designer 
– List of Variant Secondary Attributes, e.g. heel height = 4.0
Search - Using MongoDB indexing 
• Get summary from item id 
58 
db.variation.find({ _id: "p301671" }) 
• Get summary's specific variation from SKU 
db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } ) 
• Get summary by department, sorted by rating 
db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } ) 
• Get summary with mix of parameters 
db.variation.find( { department : "Shoes" , 
"vars.attrs" : { "color" : "Gray"} , 
"category" : ^/Shoes/Women/ , 
"price" : { "$gte" : 65.99 , "$lte" : 180.99 } } )
Search – Using MongoDB indexing 
• The following indices are used: 
59 
– department + attr + category + _id 
– department + vars.attrs + category + _id 
– department + category + _id 
– department + price + _id 
– department + rating + _id 
• _id used for pagination 
• Can take advantage of index intersection 
• With several attributes specified (e.g. color=red 
and size=6), which one is looked up?
Search – Using MongoDB indexing 
Facet samples: 
{ "_id" : "Accessory Type=Hosiery" , "count" : 14} 
{ "_id" : "Ladder Material=Steel" , "count" : 2} 
{ "_id" : "Gold Karat=14k" , "count" : 10138} 
{ "_id" : "Stone Color=Clear" , "count" : 1648} 
{ "_id" : "Metal=White gold" , "count" : 10852} 
Single operations to insert / update: 
db.facet.update( { _id: "Accessory Type=Hosiery" }, 
60 
{ $inc: 1 }, true, false) 
The facet with lowest count is the most restrictive… 
It should come first in the $all query!
Search – Comparing Solutions 
• Search Engine advantages: 
61 
– Index size (~ 10x smaller than MongoDB's) 
– Indexing speed 
– Read speed, integrated cache 
– All languages support 
– Built-in facetted search, which includes facet counts 
• MongoDB's Indexing advantages: 
– Built-in the data store, no additional server / software needed 
– Single query to get the results 
– Can filter down the variant entry and save computing 
> Winner here is Elastic Search
Search – Benchmarking 
Department Category Price Primary 
62 
attribute 
Time 
Average 
(ms) 
90th (ms) 95th (ms) 
1 0 0 0 2 3 3 
1 1 0 0 1 2 2 
1 0 1 0 1 2 3 
1 1 1 0 1 2 2 
1 0 0 1 0 1 2 
1 1 0 1 0 1 1 
1 0 1 1 1 2 2 
1 1 1 1 0 1 1 
1 0 0 2 1 3 3 
1 1 0 2 0 2 2 
1 0 1 2 10 20 35 
1 1 1 2 0 1 1
Closing Comments
Q & A Time 
64
Thank You! 
Antoine Girbal 
Principal Solutions Engineer, MongoDB Inc. 
@antoinegirbal
Retail referencearchitecture productcatalog

Mais conteúdo relacionado

Mais procurados

Blending MongoDB and RDBMS for ecommerce
Blending MongoDB and RDBMS for ecommerceBlending MongoDB and RDBMS for ecommerce
Blending MongoDB and RDBMS for ecommerce
Steven Francia
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
Kai Zhao
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud Operation
Edureka!
 

Mais procurados (20)

MongodB Internals
MongodB InternalsMongodB Internals
MongodB Internals
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
MongoDB Aggregation Performance
MongoDB Aggregation PerformanceMongoDB Aggregation Performance
MongoDB Aggregation Performance
 
MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals MongoDB Sharding Fundamentals
MongoDB Sharding Fundamentals
 
Mongodb basics and architecture
Mongodb basics and architectureMongodb basics and architecture
Mongodb basics and architecture
 
MongoDB Fundamentals
MongoDB FundamentalsMongoDB Fundamentals
MongoDB Fundamentals
 
Blending MongoDB and RDBMS for ecommerce
Blending MongoDB and RDBMS for ecommerceBlending MongoDB and RDBMS for ecommerce
Blending MongoDB and RDBMS for ecommerce
 
An introduction to MongoDB
An introduction to MongoDBAn introduction to MongoDB
An introduction to MongoDB
 
Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)Mongodb introduction and_internal(simple)
Mongodb introduction and_internal(simple)
 
MongoDB WiredTiger Internals
MongoDB WiredTiger InternalsMongoDB WiredTiger Internals
MongoDB WiredTiger Internals
 
Introduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizerIntroduction to Mongodb execution plan and optimizer
Introduction to Mongodb execution plan and optimizer
 
Webinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance ImplicationsWebinar: MongoDB Schema Design and Performance Implications
Webinar: MongoDB Schema Design and Performance Implications
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
 
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective IndexingMongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
MongoDB .local Toronto 2019: Tips and Tricks for Effective Indexing
 
MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)MongoDB for Coder Training (Coding Serbia 2013)
MongoDB for Coder Training (Coding Serbia 2013)
 
MongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World ExamplesMongoDB Schema Design: Four Real-World Examples
MongoDB Schema Design: Four Real-World Examples
 
Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA Best Practices for Becoming an Exceptional Postgres DBA
Best Practices for Becoming an Exceptional Postgres DBA
 
ElasticSearch at berlinbuzzwords 2010
ElasticSearch at berlinbuzzwords 2010ElasticSearch at berlinbuzzwords 2010
ElasticSearch at berlinbuzzwords 2010
 
MongoDB- Crud Operation
MongoDB- Crud OperationMongoDB- Crud Operation
MongoDB- Crud Operation
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 

Semelhante a Retail referencearchitecture productcatalog

Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
MongoDB
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference Architecture
MongoDB
 
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeWebinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
MongoDB
 
Django introduction @ UGent
Django introduction @ UGentDjango introduction @ UGent
Django introduction @ UGent
kevinvw
 

Semelhante a Retail referencearchitecture productcatalog (20)

Unify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog ServiceUnify Your Selling Channels in One Product Catalog Service
Unify Your Selling Channels in One Product Catalog Service
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference Architecture
 
Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...Simplifying & accelerating application development with MongoDB's intelligent...
Simplifying & accelerating application development with MongoDB's intelligent...
 
Multi-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated PolystoresMulti-model Databases and Tightly Integrated Polystores
Multi-model Databases and Tightly Integrated Polystores
 
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
Retail Reference Architecture Part 3: Scalable Insight Component Providing Us...
 
MongoDB Days UK: Building Apps with the MEAN Stack
MongoDB Days UK: Building Apps with the MEAN StackMongoDB Days UK: Building Apps with the MEAN Stack
MongoDB Days UK: Building Apps with the MEAN Stack
 
Optimize drupal using mongo db
Optimize drupal using mongo dbOptimize drupal using mongo db
Optimize drupal using mongo db
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
MongoDB Meetup
MongoDB MeetupMongoDB Meetup
MongoDB Meetup
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Salesforce Analytics Cloud - Explained
Salesforce Analytics Cloud - ExplainedSalesforce Analytics Cloud - Explained
Salesforce Analytics Cloud - Explained
 
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a TimeWebinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
Webinar: Realizing Omni-Channel Retailing with MongoDB - One Step at a Time
 
No SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability MeetupNo SQL and MongoDB - Hyderabad Scalability Meetup
No SQL and MongoDB - Hyderabad Scalability Meetup
 
Django introduction @ UGent
Django introduction @ UGentDjango introduction @ UGent
Django introduction @ UGent
 
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and SparkVital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 
Novedades de MongoDB 3.6
Novedades de MongoDB 3.6Novedades de MongoDB 3.6
Novedades de MongoDB 3.6
 
SuiteHelp 4.0: Latest Features in Enterprise Webhelp
SuiteHelp 4.0: Latest Features in Enterprise WebhelpSuiteHelp 4.0: Latest Features in Enterprise Webhelp
SuiteHelp 4.0: Latest Features in Enterprise Webhelp
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
 
Semi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented DatabasesSemi Formal Model for Document Oriented Databases
Semi Formal Model for Document Oriented Databases
 

Mais de MongoDB

Mais de MongoDB (20)

MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB AtlasMongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
 
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
 
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDBMongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
 
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
 
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series DataMongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
 
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 MongoDB SoCal 2020: MongoDB Atlas Jump Start MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB SoCal 2020: MongoDB Atlas Jump Start
 
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
 
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
 
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
 
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
 
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your MindsetMongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
 
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas JumpstartMongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
 
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
 
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
 
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
 
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep DiveMongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
 
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & GolangMongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
 
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
 
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Retail referencearchitecture productcatalog

  • 1. One Catalog Service to rule them all Antoine Girbal Principal Solutions Engineer, MongoDB Inc. @antoinegirbal
  • 3. The many catalogs problem 3
  • 4. The many catalogs problem 1. One department in charge of master product works hard at fitting 4 data into SQL tables 2. Resulting data sits in a SQL server with a couple replicas. It's forbidden to hit it more than 100 times / sec 3. Other departments need to access the data way more often for their own services 4. Other departments need more information that is not available since it did not fit in that long devised rigid SQL schema 5. ETLs and Message Buses are put in place for other teams to try figure it out themselves… 6. Data becomes inconsistent, fragmented, not up-to-date… Problem visible both internally and by customers!
  • 5. Search – Using Solr 5 How many Catalogs and Catalog Caches do you have?
  • 6. The many catalogs problem 6 Online Store Catalog Marketing Catalog Dozens of catalogs! Department 3 Catalog Product Department Master Catalog Department 4 Catalog Department 5 Catalog Department 1 Catalog Message Bus ETLs
  • 7. Goal: Single View of Product • Single view of a product, one central catalog 7 service • Flexible schema containing all useful data • Read volume high and sustained, 100k reads / s • Can seamlessly take write spikes during catalog update • Advanced indexing and querying • Geographical distribution for HA and low latency
  • 8. Agenda 1. MongoDB Overview 2. Catalog Service Architecture 3. Data Store Models 4. Product Search 8
  • 10. MongoDB is a great fit • Holds complex JSON structures • Dynamic Schema for Agility • complex querying and in-place updating • Secondary, compound and geo indexing • full consistency, durability, atomic operations • HA and geo-distributed via Replication • Near linear scaling via Sharding • Overall, MongoDB is a unique fit! 10
  • 11. MongoDB Strategic Advantages 11 Application Horizontally Scalable -Sharding Agile Flexible High Performance & Strong Consistency Highly Available -Replica Sets { customer: “roger”, date: new Date(), comment: “Spirited Away”, tags: [“Tezuka”, “Manga”]}
  • 12. build your data to fit your application Relational MongoDB 12 { customer_id : 1, name : "Mark Smith", city : "San Francisco", orders: [ { order_number : 13, store_id : 10, date: “2014-01-03”, products: [ {SKU: 24578234, Qty: 3, Unit_price: 350}, {SKU: 98762345, Qty: 1, Unit_Price: 110} ] }, { <...> } ] } CustomerID First Name Last Name City 0 John Doe New York 1 Mark Smith San Francisco 2 Jay Black Newark 3 Meagan White London 4 Edward Danields Boston Order Number Store ID Product Customer ID 10 100 Tablet 0 11 101 Smartphone 0 12 101 Dishwasher 0 13 200 Sofa 1 14 200 Coffee table 1 15 201 Suit 2
  • 13. Notions 13 RDBMS MongoDB Database Database Table Collection Row Document Column Field
  • 15. Architecture Overview 15 Information Management Merchandising Content Inventory Customer Channel Sales & Fulfillment Insight Social Customer Channels Amazon Ebay … Stores POS Kiosk … Mobile Smartphone Tablet Website Contact Center Social Facebook Twitter … Application Servers API Data and Service Integration Suppliers Supply Chain Management System Data Warehouse Analytics 3rd Party In Network Web Servers
  • 16. Commerce Functional Components 16 Customer Enterprise Information Layer Look & Feel Navigation Customization Personalization Branding Promotions Chat Ads Customer's Perspective Research Browse Search Select Shopping Cart Purchase Checkout Receive Track Use Feedback Maintain Dialog Assist Market / Offer Guide Offer Semantic Search Recommend Rule-based Decisions Pricing Coupons Sell / Fullfill Orders Payments Fraud Detection Fulfillment Business Rules Insight Session Capture Activity Monitoring Information Management Merchandising Content Inventory Customer Channel Sales & Fulfillment Insight Social
  • 17. Merchandising Components Merchandising 17 MongoDB Item Variant Hierarchy Localization Pricing Promotions Ratings & Reviews Calendar Semantic Search
  • 18. Merchandising - Architecture 19 MongoDB Data Store Items Pricing Promotions Variants Ratings & Reviews Search Engine … Product Service API Online Store Marketing Inventory SCMS Public API …
  • 20. Models - Product Page 21 Product images General Informatio n List of Variants External Informatio n Localized Description
  • 21. Models - Overview • Item: the overall product info (e.g. Levi’s 501) • Variant: a specific variant of an item (e.g. in black size 6) 22 which typically has a specific SKU / UPC • Price: price information may vary based on the store, the variant, etc • Hierarchy: the item taxonomy • Facet: facets to search products by • Vendors: a given sku may be available through several vendors if the site is a marketplace > Don't try to fit all in the same document!
  • 22. 23 One Item Hundreds of sizes Dozens of colors Models – Overview
  • 23. Models - Overview • A single item may have thousands of variants • Each variant can have hundreds of attributes • Altogether a single item can represent many MBs 24 worth of JSON text • Don't try to fit everything into the same document! • Use a schema that is natural and fits the API
  • 24. Models - Item Model { "_id": "054VA72303012P", // the item id 25 "desc": [ // item descriptions { "lang": "en", "val": "Give your dressy look a lift with ..." }, ... ], "name": "Women's Kate Ivory Peep-Toe Stiletto Heel", "category": "/84700/80009/1282094266/1200003270", // hierarchy "brand": { "id": "2483510", "img": "http://...", "name": "Metaphor" }, "assets": { // references to all assets "imgs": [ { "img": { "width": 1900, "height": 1900, "src": "http://..." }, ... ] }, "shipping": { // shipping specs }, "specs": { // item specs }, "attrs": [ // list of items attributes (facets) { "name": "Heel Height", "value": "High (2-1/2 to 4 in.)" }, { "name": "Toe", "value": "Open toe" }, ... ], "variants": { // quick info on the variants "cnt": 9, "attrs": [ { "dispType": "DROPDOWN", "name": "Color" }, { "dispType": "DROPDOWN", "name": "Shoe Size" }, ... ] }, "lastUpdated": 1400877254787 // keep track of updates }
  • 25. Models - Item Model • Get item by id 26 db.definition.findOne( { _id: "301671" } ) • Get items from list of ids db.definition.findOne( { _id: { $in: ["301671", "301672" ] } } ) • Get items by department db.definition.find({ category: { $regex: "^/84700/" } }) • Get items by category prefix db.definition.find( { category: { $regex: "^/84700/80009/" } } ) • Secondary Indices name, category, lastUpdated
  • 26. Models – Variant Model { "_id": "05458452563", // the sku 27 "name": "Width:Medium,Color:Ivory,Shoe Size:6.5", "itemId": "054VA72303012P", // reference to the item id "altIds": { "upc": "632576103580" }, "assets": { // list of assets specific to variant "imgs": [ { "width": 1900, "height": 1900, "src": "http://..." }, { "width": 1900, "height": 1900, "src": "http://..." }, ... ] }, "attrs": [ // list of attributes specific to variant { "name": "Width", "value": "Medium" }, { "name": "Color", "family": "White", "value": "Ivory" }, { "name": "Size", "value": "6.5" }, ... ], "lastUpdated": 1400877254787 // keep track of updates }
  • 27. Models – Variant Model • Get variant from SKU 28 db.variant.find( { _id: "05458452563" } ) • Get all variants for a product, sorted by SKU db.variant.find( { itemId: "054VA72303012P" } ).sort( { _id: 1 } ) • Indices itemId, lastUpdated
  • 28. Models - Hierarchy 29 { "_id": "1200003270", // the node id "name": "Women's Heels & Pumps", "count": 22305, // how many items in this category "parents": [ // list of parents "1282094266" ], "facets": [ // facets that exists for this category "Heel Height", "Toe", "Upper Material", "Width", "Shoe Size", "Color" ] }
  • 29. Models – Hierarchy • Get hierarchy node by id 30 db.hierarchy.find( { _id: "1200003270" } ) • Get hierarchy node from parent id db.hierarchy.find( { parents: "1282094266" } ) • Get departments (no parent) db.hierarchy.find( { parents: null } ) • Secondary Indices parents
  • 30. Models – per Store Pricing Per store pricing could result in billions of documents…unless it is built in a modular way: _id: concatenation of item and store. Item: can be an item id or variant id (sku) Store: can be a store group (online) or store id. 31 { "_id": "skuSPM8824542513_1234/store123", "price": 69.99, "sale": { "salePrice": 42.72, "saleEndDate": "2050-12-31 23:59:59" }, "lastUpdated": 1374647707394 }
  • 31. Models – per store Pricing • Get all prices for a given item 32 db.prices.find( { _id: /^item301671/ ) • Get all prices for a given sku (price could be at item level) db.prices.find( { _id: { $in: [ /^sku730223104376/, /^item301671/ ]) • Get minimum and maximum prices for a sku db.prices.aggregate( { match }, { $group: { _id: 1, min: { $min: price }, max: { $max : price} } }) • Get price for a sku and store id (returns up to 4 prices) db.prices.find( { _id: { $in: [ "sku730223104376/store1234", "sku730223104376/sgroup0", "item301671/store1234", "item301671/sgroup0"] , { price: 1 })
  • 33. Search – Browse and Search products Browse by category 34 Special Lists Filter by attributes Lists hundreds of item summaries By far the toughest page to get right and fast …
  • 34. Search – Browse and Search products The previous page presents many challenges: • Response within milliseconds for hundreds of items • Faceted search on many attributes: category, brand, … • Efficient sorting on several attributes: price, popularity • Pagination feature which requires deterministic ordering > Search engines are built for this purpose! 35
  • 35. Search – Traditional Architecture 36 Product Data Store Product Search Indexing #1 obtain search results IDs #2 obtain objects by ID from cache or DB Cache Application Pre-joined into objects
  • 36. Search – Traditional Architecture The traditional architecture issues: • 3 different systems to maintain: RDBMS, 37 Search engine, Caching layer • RDBMS schema is complex and static • Applications needs to talk many languages
  • 37. Search – Architecture with MongoDB 38 Product Data Store Product Search Indexing #1 obtain search results IDs Applications #2 obtain objects by list of IDs MongoDB Ready-to-use product documents Search Engine Product API Application issues single query
  • 38. Search - Mongo-Connector 39 MongoDB Search Engine Oplog Mongo Connector #1 Initial dump of the collections #2 Updates streaming via Oplog Translatio n, filtering Indexing Indexing
  • 39. Search - Mongo-Connector • Open-source Project at 40 https://github.com/10gen-labs/mongo-connector • Python app that reads from MongoDB's oplog and publishes to target of choice • Supports initial sync by dumping the data • Default connectors for Solr, Elastic Search, other MongoDB cluster • Easily extensible to update other systems like SQL
  • 40. Search – Mongo-Connector 41 What is the data to index?
  • 41. Search – More Searching 42 Images of the matching variants are displayed Price and Rating Facets for variants
  • 42. Search – More Searching … more challenges: • Attributes at the variant level: color, size, etc • Attributes from other docs: pricing, ratings, etc • Display the matching variant's image and details • Thousands of matching variants for an item, still 43 need to display a single item • Challenge to properly index the data > Need for a single summary document per item
  • 43. Search - Architecture 44 MongoDB Data Store Items Summaries Pricing Ratings & Reviews Variants Promotions
  • 44. Search – Summary Model { "_id": "3ZZVA46759401P", // the item id 45 "name": "Women's Chic - Black Velvet Suede", "dep": "84700", // useful as standalone for indexing "cat": "/84700/80009/1282094266/1200003270", "desc": { "lang": "en", "val": "This pointy toe slingback ..." }, "img": { "width": 450, "height": 330, "src": "http://..." }, "attrs": [ // global attributes, easily indexable by SE "heel height=mid (1-3/4 to 2-1/4 in.)", "brand=metaphor", "shoe size=6", "shoe size=6.5", ... ], "sattrs": [ // global attributes, not to be indexed "upper material=synthetic", "toe=open toe", ... ], "vars": [ { "id": "05497884001", "img": [ // images], "attrs": [ // list of variant attributes to index ] "sattrs": [ // list of variant attributes not to index ] }, … ] }
  • 45. Search – Using Solr 46 Let's use Solr …
  • 46. Search - Using Solr 47
  • 47. Search - Using Solr Defining the schema in schema.xml <fields> <!-- some of the core fields --> <field name="_id" type="string" indexed="true" stored="true" /> <field name="name" type="text_general" indexed="true" stored="true" /> <field name="cat" type="string" indexed="true" stored="true" /> <field name="price" type="float" indexed="true" stored="true"/> <!-- the full text to index --> <field name="desc.0.val" type="text_general" indexed="true" stored="true"/> <!-- dynamic attributes for facetting --> <dynamicField name="attrs.*" type="string" indexed="true" stored="true"/> <!– some Solr specific fields --> <field name="_version_" type="long" indexed="true" stored="true"/> <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" multiValued="false"/> <dynamicField name="*" type="ignored" multiValued="true"/> </fields> 48
  • 48. Search - Using Solr Starting up the connector > mongo-connector > Keep it running, it will just stream the Oplog 49 -m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo -t http://localhost:8983/solr // the solr -d mongo_connector/doc_managers/solr_doc_manager.py -n "catalog.summary" // target summary collection --auto-commit-interval=60 // commit every 1 min …
  • 49. Search – Using Solr Document in Solr looks like: { "desc.0.val": "Our classic "Flying Duck" styled as a ...", Lists are flattened which is difficult to use  > Must use to named fields to implement Facets 50 "name": "Drake Waterfowl Duck Label SS T-Shirt Army Green", "attrs.1": "brand=Drake Waterfowl", "attrs.0": "style=t-shirts", "cat": "/84700/1200000239/1282094207/1200000817", "_id": "SPM10823491916", "_version_": 1479173524477182000, "timestamp": "2014-09-13T23:09:59.782Z" }
  • 50. Search – Using Elastic Search 51 Let's use Elastic Search…
  • 51. Search - Using Elastic Search 52
  • 52. Search - Using Elastic Search ElasticSearch understands whole document right off the bat  Just need to tell ES not to tokenize the facets: > Everything else is indexed auto-magically! 53 $ curl -XPOST localhost:9200/largecat3.summary -d '{ "settings" : { "number_of_shards" : 1 }, "mappings" : { "string" : { // string is the name of default mapping type "properties" : { "attrs" : { "type" : "string", "index" : "not_analyzed" } } } } }'
  • 53. Search - Using Elastic Search Starting up the connector > mongo-connector > Keep it running, it will just stream the Oplog 54 -m ec2-54-80-63-229.compute-1.amazonaws.com:27017 // the mongo -t http://localhost:9200 // the ES -d mongo_connector/doc_managers/elastic_doc_manager.py -n "catalog.summary" // target summary collection --auto-commit-interval=60 // commit every 1 min …
  • 54. Search - Using Elastic Search Querying for documents, with Facet info… works well  $ curl -X POST "http://localhost:9200/largecat3.summary/_search?pretty=true" -d ' { "query" : { "query_string" : {"query" : "Ipad"} }, 55 "facets" : { "tags" : { "terms" : {"field" : "attrs"} } } }' { "took" : 6, "hits" : { "total" : 151, "max_score" : 0.5892989, "hits" : [ { "_index" : "largecat3.summary", "_type" : "string", "_id" : "000000000000000012730000000000QAU-QR2442P", "_score" : 0.5892989, "_source": { // original JSON from MongoDB }, ... ] }, "facets" : { "tags" : { "_type" : "terms", "total" : 1577, "terms" : [ { "term" : "ring size=9", "count" : 120 }, { "term" : "ring size=8", "count" : 120 }, { "term" : "metal=sterling silver", "count" : 112 }, ... ] } } }
  • 55. Search – Using MongoDB Indexing 56 How about MongoDB's indexes and Full-Text-Search?
  • 56. Search – Using MongoDB indexing The summary contains: • department e.g. "Shoes" • Fields to index 57 – Category path, e.g. "Shoes/Women/Pumps" – Price – List of Item Attributes, e.g. Brand = Guess – List of Variant Attributes, e.g. Color = red • Fields not to index – List of Item Secondary Attributes, e.g. Style = Designer – List of Variant Secondary Attributes, e.g. heel height = 4.0
  • 57. Search - Using MongoDB indexing • Get summary from item id 58 db.variation.find({ _id: "p301671" }) • Get summary's specific variation from SKU db.variation.find( { "vars.sku": "730223104376" }, { "vars.$": 1 } ) • Get summary by department, sorted by rating db.variation.find( { department: "Shoes" } ).sort( { rating: 1 } ) • Get summary with mix of parameters db.variation.find( { department : "Shoes" , "vars.attrs" : { "color" : "Gray"} , "category" : ^/Shoes/Women/ , "price" : { "$gte" : 65.99 , "$lte" : 180.99 } } )
  • 58. Search – Using MongoDB indexing • The following indices are used: 59 – department + attr + category + _id – department + vars.attrs + category + _id – department + category + _id – department + price + _id – department + rating + _id • _id used for pagination • Can take advantage of index intersection • With several attributes specified (e.g. color=red and size=6), which one is looked up?
  • 59. Search – Using MongoDB indexing Facet samples: { "_id" : "Accessory Type=Hosiery" , "count" : 14} { "_id" : "Ladder Material=Steel" , "count" : 2} { "_id" : "Gold Karat=14k" , "count" : 10138} { "_id" : "Stone Color=Clear" , "count" : 1648} { "_id" : "Metal=White gold" , "count" : 10852} Single operations to insert / update: db.facet.update( { _id: "Accessory Type=Hosiery" }, 60 { $inc: 1 }, true, false) The facet with lowest count is the most restrictive… It should come first in the $all query!
  • 60. Search – Comparing Solutions • Search Engine advantages: 61 – Index size (~ 10x smaller than MongoDB's) – Indexing speed – Read speed, integrated cache – All languages support – Built-in facetted search, which includes facet counts • MongoDB's Indexing advantages: – Built-in the data store, no additional server / software needed – Single query to get the results – Can filter down the variant entry and save computing > Winner here is Elastic Search
  • 61. Search – Benchmarking Department Category Price Primary 62 attribute Time Average (ms) 90th (ms) 95th (ms) 1 0 0 0 2 3 3 1 1 0 0 1 2 2 1 0 1 0 1 2 3 1 1 1 0 1 2 2 1 0 0 1 0 1 2 1 1 0 1 0 1 1 1 0 1 1 1 2 2 1 1 1 1 0 1 1 1 0 0 2 1 3 3 1 1 0 2 0 2 2 1 0 1 2 10 20 35 1 1 1 2 0 1 1
  • 63. Q & A Time 64
  • 64. Thank You! Antoine Girbal Principal Solutions Engineer, MongoDB Inc. @antoinegirbal