ICNC 2013 SenSec Presentation

Jiang Zhu, Pang Wu, Xiao Wang, Joy Ying Zhang
Carnegie Mellon University
January 31st, 2013

1

• Monitor and track user behavior on smartphones using various
on-device sensors
• Convert sensory traces and other context information to Personal
Behavior Features
• Build continuous n-gram model with these features and use it for
calculation of Sureness Scores
• Trigger various Authentication Schemes when certain application
is launched.

2

60% • “The 329 organizations
polled had collectively lost
50% more than 86,000 devices
… with average cost of lost
40% data at $49,246 per device,
worth $2.1 billion or $6.4
30%
million per organization.
20%

10%
"The Billion Dollar Lost-Laptop Study,"
0% conducted by Intel Corporation and the
Ponemon Institute, analyzed the scope
and circumstances of missing laptop
Mobile Device Loss or theft PCs.

Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September 21-
28, 2010, with an oversample in the top 20 cities (based on population).

4

Application
Password Different
applications may
have different
A major source of
sensitivities
security vulnerabilities.
Easy to
guess, reuse, forgotte
n, shared Usability
Authentication too-often or
sometimes too loose

5

Quantization Clustering

Risk Analysis Sensor Fusion Activity
Tree and Segmentation Recognition

Certainty of Risk Application Sensitivity

< Application
Access
Control
Application Access Control

7

• Human behavior/activities share some common properties
with natural languages
• Meanings are composed from meanings of building blocks
• Exists an underlying structure (grammar)
• Expressed as a sequence (time-series)

• Apply Statistical NLP to mobile sensory data

• Information retrieval, machine translation, text
categorization, summarization, prediction

8

• Generative language model: P( English sentence) given a
model
P(“President Obama has signed the Bill of … ”| Politics ) >>
P(“President Obama has signed the Bill of … ” | Sports )
LM reflects the n-gram distribution of the training data:
domain, genre, topics.
• With labeled behavior text data, we can train a LM for
each activity type: “walking”-LM, “running”-LM and
classify the activity as

9

• User behavior at time t depends only on the last n-1 behaviors

• Sequence of behaviors can be predicted by n consecutive
location in the past

• Maximum Likelihood Estimation from training data by counting:

• MLE assign zero probability to unseen n-grams
Incorporate smoothing function (Katz)
Discount probability for observed grams
Reserve probability for unseen grams

12

• Convert feature vector series to label streams – dimension reduction

• Using n-gram to model sequence of label stream for each sensory
dimension – current state and transition captured
• Step window with assigned length

A1 A2 A1 A4

G2 G5 G2 G2

W2 W1 W2

P1 P3 P6 P1

A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1

13

• Build n-gram models for M users/classes m=1,2,3…M

• Given a behavior text L, we estimate L is generated by the model
from user m:

• The user classification problem formulated as

14

• A binary classification problem: Classifying a user as the owner
â=1 or not â=-1.
• Given a behavioral n-gram model

• And an observation r, evaluate the probability of a given user is
the owner â=1 and check if exceeding a threshold θ:

• Given a sequence of behavior text L, and a sensitivity threshold θ,
validate if L is generated by user m

15

0. 8

0. 7
Aver age Log Pr obabi l i t y

0. 6

0. 5

0. 4
C D A
0. 3

0. 2
Log Probility B
Low Threshold
High Threshold
0. 1

0
Sl i di ng W ndow Posi t i on
i

16

Sensing Preprocessing Modeling

N-gram
Model

Feature Behavior Text
Construction Generation

User

Classifier
Classification

User

Classifier
Binary
Authentication

Threshold

Inference

17

• Accelerometer
• Used to summarize
acceleration stream
• Calculated separately for each
dimension [x,y,z,m]
• Meta features:
Total Time, Window Size

• GPS: location string from Google Map API and mobility path

• WiFi: SSIDs, RSSIs and path

• Applications: Bitmap of well-known applications

• Application Traffic Pattern: TCP UDP traffic pattern vectors: [
remote host, port, rate ]

19

• Offline data collection (for training and testing)
Pick up the device from a desk
Unlock the device using the right slide pattern
Invoke Email app from the "Home Screen"
Lock the device by pressing the "Power" button
Put the device back on the desk

20

• 71.3% True-Positive Rate with 13.1% False Positive

22

Quantization Clustering

Risk Analysis Sensor Fusion Activity
Tree and Segmentation Recognition

Certainty of Risk Application Sensitivity

<

Application Access Control

• Experiments to discover anomaly usage with ~80%accuracy with
only days of training data
26

• Alpha test in Jun 2012, 1st Google Play Store release in Oct 2012

• False Positive: 13% FPR still annoying users sometimes

• Use adaptive model
• Adding the trace data shortly before a false positive to the training data and
update the model

• Change passcode validation to sliding pattern

• A false positive will grant a “free ride” for a configurable duration
• Assumption: just authenticated user should control the device for a given
period of time

• “Free Ride” period will end immediately if abrupt context change is
detected.
• Newer version is scheduled to be release in Jan 2013.

27

• Extended data set for feature construction
TCP, UDP traffic; sound; ambient lighting; battery status, etc.

• Data and Modeling
Gain more insights into the data, features and factorized relationships among
various sensors
Try other classification methods and compare results: LR, SVM, Random
Forest, etc

• Enhanced security of SenSec components
Integration with Android security framework and other applications

• Privacy challenges
Data collection, model training, privacy policy, etc.

• Energy efficiency

28

ICNC 2013 SenSec Presentation

Recomendados

Recomendados

Mais conteúdo relacionado

Destaque

Destaque (20)

Semelhante a ICNC 2013 SenSec Presentation

Semelhante a ICNC 2013 SenSec Presentation (20)

ICNC 2013 SenSec Presentation

Notas do Editor