The document discusses Eventbrite's data platform and services. It describes Eventbrite as a social event ticketing platform that has doubled revenue yearly. It then summarizes Eventbrite's use of Hadoop, HBase, MongoDB and other technologies to power analytics, recommendations, and real-time data processing. Recommendations are generated using interest graphs and social graphs built from user activity data stored in Hadoop and HBase.
2. Agenda
Eventbrite
Data Products
Data Platform
Recommendations
Questions
3. • A social event ticketing and discovery platform
• 50th Million Ticket Sold
• Revenue doubled YOY
• 180 Employees in SOMA SF
• Solving significant engineering problems
• Data
• Data, Infrastructure, Mobile, Web, Scale, Ops, QA
• Firing all cylinders and hiring blazing fast
www.eventbrite.com/jobs
14. Infrastructure - Sqoozie
• Workflow for mysql imports to HDFS
• Generate Sqoop commands
• Run these imports in parallel
• Transparent to schema changes
• Include or exclude on column, data types, table level
• Data Type Casting tinyint(1) Integer
• Distributed Table Imports
15. Infrastructure - Blammo
• Raw logs are imported to HDFS via flume
• Almost real-time – 5 min latency
• Logs are key-value pairs in JSON
• Each log producer publishes schema in yaml
• Hive schema and schema yaml in sync using thrift
• Control exclusion and inclusion
18. Recommendation Engines
Interest Graph
Based
Social Graph
Based (Your (Your friends who
friends like Lady like rock music
Collaborative Gaga so you will like you are
Filtering – Item- like Lady attending Eric
Item similarity Gaga, PYMK – Clapton Event–
Facebook, Linkedin Eventbrite)
Collaborative (You like
Godfather so you )
Filtering – User-
User Similarity will like Scarface -
Netflix)
(People who
Item bought camera
Hierarchy also bought
batteries -
(You bought Amazon)
camera so you
need batteries
- Amazon)
19. Why Interest?
Events are Social Events are Interest
Dense Graph is Irrelevant
Interest are Changing
20. How do we know your Interest?
• We ask you
• Based on your activity
• Events Attended
• Events Browsed
• Facebook Interests
• User Interest has to match Event category
• Static
• Machine Learning
• Logistic Regression using MLE
• Sparse Matrix is generated using MapReduce
• A model for each interest
21. Model Based vs Clustering
Item-Item vs User-User
Building Social Graph is Clustering Step
Social Graph Recommendation is a Ranking Problem
24. 15M * 260 * 260 = 1.14 Trillion Edges
4Billion edges ranked
Each node is a feature vector representing a User
Each edge is a feature vector representing a Relationship
25. Feature Generation
• Mixed Features
• A series of map-reduce jobs
• Output on HDFS in flat files; Input to subsequent jobs
• Orders = Event Attendees
• MAP: eid: uid
• REDUCE: eid:[uid]
• Attendees Social Graph
• Input: eid:[uid]
• MAP: uidi:[uid]
• REDUCE: uid:[neighbors]
• Interest based features, user specific, graph mining etc
• Upload feature values to HBase