Adience's VP development - Pablo Rosenman, presents our architecture and design patterns for distilling behavioural signals in mobile devices into user insights.
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
Adience - Turning low level behavioural signals into user profiles
1. Turning Low Level Behavioural
Signals Into User Profiles
Pablo Rosenman, VP Development
2. Adience
Leading the user-centric mobile revolution
- Harness Deep Learning to profile mobile app users
- Distill user/app interaction to actionable segmentation data
2
5. Adience SDK
- Runs on tens of millions of devices
- Runs in the background, without interfering with device’s
operations
- Collects raw data from the system and environment (according
to available permissions)
- Reduces dimensionality and anonymizes the data
- Sends results to the SDK Server
5
6. SDK Server
- Receives tens of millions of data submissions from the Mobile
SDK installations per day
- It should be able to scale by two orders of magnitude
- It should handle requests quickly, so as not to hang the client
(i.e. Mobile SDK)
- It should avoid losing data
6
7. SDK Server (Architecture)
- Data is sent from the mobile SDK to an Apache server running
on EC2
- SDK Server verifies validity of incoming data
- Incoming data gets written immediately (no processing) to S3
Amazon
EC2
Mobile Client Amazon S3
7
8. SDK Server (Scaling)
- The ELB balances the load on all the servers
- Auto Scaling will make sure there are enough servers to
handle the load
Amazon
EC2
Auto Scaling
Mobile Client Amazon S3Elastic Load
Balancer
8
9. Insights Workflow
- Create insights on the device’s owner when new data
arrives from the device
- Doesn’t have to be real-time (as the data arrives), but
shouldn’t be far behind
9
10. Insights Workflow (cont.)
- Data report sent by SDK consists of:
- Simple data points requiring simple statistic and arithmetic
operations, for example:
- Device model
- OS version
- More complex data matrices requiring matrix operations,
for example:
- Machine Learning features on time series data
- Machine Learning features on photos
10
11. Insights Workflow (Architecture)
- Simple pattern for streamlined processing server application:
- Read input S3 filename from input SQS
- Read the file from the input S3 bucket, and process it
- Write results to file in output S3 bucket
- Send output S3 filename to output SQS
EC2
Servers
Amazon
SQS
S3 Bucket Auto Scaling
S3 Bucket
Amazon
SQS
11
12. Insights Workflow (Architecture)
- Aggregate the data from all reports to a single device object
- Create insights from all the device’s aggregated data
- Advantages of architecture:
- Scalability
- Decoupling
Insights
Servers
Devices
Servers
Amazon
SQS
Amazon
SQS
Reports
S3 Bucket
Insights
DynamoDB
Table
Deep Learning
Servers (GPU)
Devices
S3 Bucket
Amazon
SQS
12
14. Events SDK
- Receives events based on user interaction with the app
- Some events are automatically implemented (app was started)
- Custom events are the real driving force (user has made an in-
app purchase for $3.99)
- Events should be sent to the Events Server
14
15. Events Server
- Receives hundreds of millions of data submissions from the
Mobile SDK installations per day
- It should be able to scale by two orders of magnitude
- It should handle requests quickly, so as not to hang the client
- Analytics engine should work on all data from the last 30 days
- Data should be enriched with the user insights
15
16. Events Server (Architecture)
- All incoming events are written to a file in the local volume
- Once every hour, we close the file in each instance and ship it
to S3
Amazon
EC2
Auto Scaling
Mobile Client Amazon S3Elastic Load
Balancer
Amazon
EBS
logrotate
16
17. Insights MapReduce
- At the end of each day, all events from that day are in the
events S3 bucket
- We add to these a “mock event” per report sent to the SDK
Server
- Eventually, we wish to compare all the app’s users in the last 30
days to a subset of those users
17
18. Insights MapReduce (cont.)
- Using Amazon EMR, we aggregate the data per app, device,
day, and event type
- Example: device 0123, on 2016-01-04, in app Blappy Fird,
purchased in-app goods worth a total of $100
Events
S3 Bucket
Raw2Daily
Amazon EMR
Mock Events
S3 Bucket
Daily
S3 Bucket
18
19. Insights MapReduce (cont.)
- Using the Daily data for the last 30 days, we run an additional
EMR to aggregate per app, device, and event type
- Example: device 0123, in app Blappy Fird, purchased in-app
goods worth a total of $1000 (in the last 30 days)
- We enrich the data by adding the device’s insights to each record
Daily
S3 Bucket
Daily2Aggregate
Amazon EMR
Aggregate
S3 Bucket
Insights
DynamoDB
Table
19
20. Insights MapReduce (cont.)
- Accessing DynamoDB per event type is costly
- We know last day’s users - save them to an in-memory cache
Daily
S3 Bucket
Daily2Aggregate
Amazon EMR
Aggregate
S3 Bucket
Insights
DynamoDB
Table
20
Insights
Servers
Insights
ElastiCache
21. Insights MapReduce (cont.)
- Using the Aggregate data for the last 30 days, we run an
additional EMR to aggregate per app, country, age, gender, and
subset type
- Example: app Blappy Fird, in the US, for males aged 25-34
who purchased in-app goods worth a total of more than
$500 (in the last 30 days), 70% are tech savvy, 40% are
commuters, etc.
Aggregate
S3 Bucket
Subset
S3 Bucket
Aggregate2Subset
Amazon EMR
21
23. Insights MapReduce (cont.)
- How can we show data on apps that haven’t integrated us?
- Create a mock event per app that we know is installed on the
device!
Events
S3 Bucket
Raw2Daily
Amazon EMR
Mock
Events
S3 Bucket
Daily
S3 Bucket
Daily2Aggregate
Amazon EMR
Aggregate
S3 Bucket
Insights
DynamoDB
Table
Aggregate2Subset
Amazon EMR
Subset
S3 Bucket
23
26. SDK Server (Next Generation)
Amazon
EC2
Auto Scaling
Mobile Client Amazon S3Elastic Load
Balancer
Mobile Client Amazon S3Amazon API
Gateway
AWS
Lambda
26
30. ELK with Amazon
- Server code sends logs to local ZMQ process
- ZMQ process then asynchronously sends to Kinesis
- Logstash pulls the Kinesis stream, and writes in batches to
ElasticSearch
Server
Code
Amazon
Kinesis
ZMQ Logstash
Amazon
ElasticSearch
30