Real Time Data Analytics with MongoDB and Fluentd at Wish

Analytics @ Wish
Powered by Fluentd & MongoDB

Wish ♥︎ MongoDB
• Primary database since 2011
• 67x mongod
• AWS → bare metal (SSDs
ftw!)

What’s Wish?
• Mobile eCommerce
• 30M+ users
worldwide
• Top 10 iOS & Android

Experiment
‘cause otherwise you’re just
guessing…

Hypothesis
“Billing Zip” is confusing outside
America

Data
Compare checkout conversions
for international, Android users

Conclusion
~7% boost in mobile sales

Goal
Frictionless analytics to everyone

{“solution”:
[“logging”,
“aggregation”,
“analysis”,
“serving”]
}

Request Logs = Source of Truth

{'contest_impressions’:'53060fbd34067e4d6cee70f4,535ad13a7360465e2ca799f8,528b714df689996fdb574800,52597
6a71c23882ab3b73ecb,5285df6db5baba737f459037,5208ae7d3deaf74a6cc65da4,5209e5c31c238861a1ab91cc,5285df6db
5baba735f459061,51f7778f3ba3770a514a5431,527be1fc227d210d2bcdeac5,532fcfe3796f6832713b5c3a,527be203227d2
10dd5cdeaac,52d3ef2806ea960dde85cb97,527bc781227d210d8acdea47,527bc793227d210d4fcdea48,
5208ad653deaf74a4bc65d41,5208acdd1c238846f9ab9028,5182fc1273c67621e507591b,5311ae6c796f68283f8f86c3,
52de2bf4ab980a2d00da786a,5208a9c53deaf74a75c65c6b,52eca45a717951350382e4be,52d3ef73bb5aa51ccf866c01,
533d6fae5aefb0427771f346,5285df6db5baba734d45901b,51c27d8d5ffe8f0b0b9b0359,52d0e002a30fb227725b6e06,
52f71bd89f5ef741d8f34698,52d3ef71bb5aa53135866d76, 5308bc467360464265101ed9,52d3ef27bb5aa5024d866c09,
52c399d60599170e49fd866e,5209be541c23886177ab91db,5208b15e1c2388615fab91b7', '_country_code': u'CA',
'_lang': u'en', '_fb_uid': 500406911, '_device_id': None, '_uid': '4eb346049b120f09f60007c0', '_tid': 2,
'_host': 'adam.corp.contextlogic.com', '_last_id': u'cc3aa96b2b3c45bca11009edc049f2f6',
'_experiment_tags': ['mobile_commerce_home_v4_female_ignore', 'mobile_large_cart_cell_ignore',
'hannibal_cohort_firsttime_buyer_ignore', 'localize_product_names__fr_ignore',
'mobile_cart_guarantee_view_ignore', 'mobile_related_tags_v2_ignore', 'shipping_price_us_ignore',
'stripe_settle_on_ship_control', 'related_super_feed_iphone_show-v4',
'mobile_commerce_home_v3_male_i18n_show', 'braintree_settle_on_ship_control',
'mobile_show_tabbed_billing_page_i18n_ignore', 'mobile_new_guarantee_text_ios_ignore',
'mobile_use_category_signup_flow_i18n_ignore', 'male_curated_first_ipad_ignore',
'mobile_commerce_home_v4_female_i18n_ignore', 'commerce_product_page_show',
'mobile_use_category_signup_flow_v3_ignore', 'mobile_save_for_price_us_female_relaunch_2_ignore',
'web_stripe_checkout_ignore', 'mobile_show_tabbed_billing_page_us_ignore', 'stripe_checkout_show',
'shipping_price_i18n_fixed-price-promo', 'chukou1_pilot_experiment_ignore',
'mobile_implicit_ratings_v1_show', 'feed_commerce_2_control', 'mobile_commerce_home_v3_male_ignore',
'swap_out_male_feed_show-weight-deep', 'related_super_feed_ipad_ignore',
'female_curated_first_iphone_ignore', 'mobile_psuedo_localized_currency_show',
'hannibal_cohort_repeat_buyer_ignore', 'web_boleto_checkout_ignore', 'exploration_v2_control',
'female_curated_first_android_ignore', 'male_curated_first_android_ignore',
'related_super_feed_android_show-v4', 'curated_feed_female_shopping_ignore',
'mobile_localized_currency_control', 'male_curated_first_iphone_ignore',
'mobile_show_required_shipping_fields_ignore', 'mobile_ct2_variable_shipping_price_showcountry',
'mobile_c2c_ignore', 'localize_product_names__es_ignore', 'related_products_v2_control',
'female_curated_first_ipad_ignore', 'mobile_categories_v1_ignore', 'related_super_feed_show',
'mobile_baby_category_signup_flow_ignore', 'mobile_checkout_offer_v2_control',
'mobile_minimum_notification_interval_ignore', 'mobile_show_tabbed_feed_existing_user_ignore',
'mobile_cart_fake_only_x_left_show', 'late_shipment_apology_v2_ignore',
'mobile_show_tabbed_feed_new_user_ignore'], '_app_type': 0, 'impression_feed_category': None, '_client':
'web', '_refer_url': None, 'sort': 'recommended', '_user_agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X
10_9_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36', '_arguments': {},
'_currency': 'CAD', '_protocol': 'http', 'offset': 0, '_method': 'GET', 'count': 40, '_locale': 'en',
'_timestamp': 1401996333, '_bsid': '979b5fbcad4f4fdbb1477ae7ba8ed123', '_is_cached': False, '_version':
None, '_response_status': 200, 'filter': 'all', '_response_time': 0.2887430191040039, '_uri': '/',
'_remote_ip': None, '_is_user_pending': False, '_id': '1e6135e3d2eb4214afdbd99456d71183'}
A feed request…

{
'products_shown': '...',
'feed_category': null,
'sort': 'recommended',
'filter': 'all',
'offset': 0,
'count': 40,

'_uid': '4eb34609ff60007c0',
'_client': 'web',
'_country_code': 'CA',
'_id': '1e6135e3d9456d7183’,
'_last_id’: 'cc39edc49f2f6',
'_experiment_tags': [...],

'_uri': '/',
'_refer_url': null,
'_arguments': {},
'_method': 'GET',
'_locale': 'en',
'_response_status': 200
}

One problem
Searching all requests ever is slow

Transaction Log
{'txn_id': '5390c295e9b9bbe68b2',
'user_id': '4eb346049b9f60007c0’,
'total': 18.0,
'shipping': 2.0,
'items': [{
'product_id': '537b42379b9e3f55f',
'qty': 1,
'price': 16.0 }]
}

Centralize Logs
• Synchronously?
• Fire & forget?
• fluentd!

Architecture
App server
Wish
fluent
d
Aggregation
server
fluentd
Aggregation
server
fluentd
Hadoop/Hive

Hadoop & Hive
• Great for log analysis
• Arbitrary queries
• No schema design constraints

Hadoop & Hive
• Running a Hadoop cluster sucks
– TreasureData’s managed Hive solution rocks!

MongoDB!
• Analysis results → MongoDB
• Store all combinations
– Unsexy, but fast
– 2 TB total

Schema
{"_id": ObjectId(…),
"click_id": 2,
"source_page_id": 1000,
"count": 20171,
"timestamp": 20140601,

Schema
"gender": "Male",
"client": "Android",
"country": "CA",
"experiment_tag":
"zip_help_text-show"}

Let’s Review
MongoDB
Logs (app
servers)
Fluentd
Hadoop/Hive

Tools
Who doesn’t love nifty graphs?

Perimeter
• A/B test reports
– Summary tables,
detailed CSVs
– See trade-offs

Analytics = faster iteration
More growth, more revenue

Analytics = faster iteration
Powered by Fluentd & MongoDB

Happy Analyzing!
adam@wish.com

{“subtitle”:”Why Fluentd?”}

http://cacm.acm.org/blogs/blog-cacm/169199-data-science-workflow-
overview-and-challenges/fulltext

Acquire Data (or
so you think)
WUT!? Invalid
UTF8?
Fix the encoding
issue…
Yell at the
engineers
Some columns
are missing!?
Run the
script…DIVISION
BY ZERO!!!

Logging.priority
=> :not_super_high

Analytics.priority
=> :very_high

Analytics.needs? :logs
=> true

{“subtitle”: ”Overview”,
“has_code”: true,
“has_example”: true}

127.0.0.1 - - [05/Feb/2012:17:11:55
+0000] "GET / HTTP/1.1" 200 140 "-"
"Mozilla/5.0 (Windows NT 6.1; WOW64)
AppleWebKit/535.19 (KHTML, like Gecko)
Chrome/18.0.1025.5 Safari/535.19"

{
"host": "127.0.0.1",
"user": "-",
"method": "GET",
"path": "/",
"code": "200",
"size": "140",
"referer": "-",
"agent": “Mozilla/5.0 (Windows…"
}

[“05/Feb/2012:17:11:55”,“web.access”,{
"host": "127.0.0.1",
"user": "-",
"method": "GET",
"path": "/",
"code": "200",
"size": "140",
"referer": "-",
"agent": “Mozilla/5.0 (Windows…"
}]

?
web.mongodb
web.file
web.hdfs
web.s3
web.mysql

<source>
type tail
path /var/log/apache/access.log
tag web.access
format apache2
</source>
Apache log
Fluentd

<source>
type tail
path /var/log/apache/access.log
tag web.access
format apache2
</source>
<match web.access>
type mongo
user kiyoto
password heartbleed
database web
collection access
… # host, port, etc.
</match>
Apache log
Fluentd
MongoDB

<match web.access>
type copy
<store>
type mongo
user kiyoto
password heartbleed
database web
collection access
</store>
<store>
type s3
… # aws secret, bucket, etc.
</store>
</match>
Apache log
Fluentd
MongoDB S3

{“subtitle”: ”scalability”}

• Automate
monitoring!
• App and System
metrics
• JSON
everywhere

• 2000+ node
• ~1B events/day
• Forwarder-
Aggregator

{“subtitle”: ”Demo”,
“need”: “Demo Karma”}

<source>
type mongostat
uri “172.17.0.2”
</source>
<match mongostat.*.*>
type mongo
user kiyoto
password heartbleed
database web
collection access
</match>
Fluentd
MongoDB
MongoDB

{
“install”: “gem install fluentd”,
“website”: “www.fluentd.org”,
“github” : “fluent/fluentd”,
“twitter”: “@fluentd”
}

Real Time Data Analytics with MongoDB and Fluentd at Wish

Recomendados

Recomendados

Mais conteúdo relacionado

Mais procurados

Mais procurados (20)

Semelhante a Real Time Data Analytics with MongoDB and Fluentd at Wish

Semelhante a Real Time Data Analytics with MongoDB and Fluentd at Wish (20)

Mais de MongoDB

Mais de MongoDB (20)

Último

Último (20)

Real Time Data Analytics with MongoDB and Fluentd at Wish

Notas do Editor