SlideShare uma empresa Scribd logo
1 de 27
Jiang Zhu, Pang Wu, Xiao Wang, Joy Ying Zhang
Carnegie Mellon University
January 31st, 2013




                                                1
• Monitor and track user behavior on smartphones using various
 on-device sensors
• Convert sensory traces and other context information to Personal
 Behavior Features
• Build continuous n-gram model with these features and use it for
 calculation of Sureness Scores
• Trigger various Authentication Schemes when certain application
 is launched.




                                                                     2
3
60%                                                                    •    “The 329 organizations
                                                                            polled had collectively lost
50%                                                                         more than 86,000 devices
                                                                            … with average cost of lost
40%                                                                         data at $49,246 per device,
                                                                            worth $2.1 billion or $6.4
30%
                                                                            million per organization.
 20%

 10%
                                                                           "The Billion Dollar Lost-Laptop Study,"
   0%                                                                         conducted by Intel Corporation and the
                                                                              Ponemon Institute, analyzed the scope
                                                                              and circumstances of missing laptop
                  Mobile Device Loss or theft                                 PCs.



Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September 21-
   28, 2010, with an oversample in the top 20 cities (based on population).

                                                                                                                       4
Application
     Password                                      Different
                                          applications may
                                             have different
A major source of
                                               sensitivities
security vulnerabilities.
Easy to
guess, reuse, forgotte
n, shared                       Usability
                            Authentication too-often or
                                 sometimes too loose




                                                               5
6
Quantization              Clustering




Risk Analysis       Sensor Fusion            Activity
    Tree           and Segmentation        Recognition




                   Certainty of Risk                                    Application Sensitivity



                                                         <        Application
                                                                  Access
                                                                  Control
                                           Application Access Control




                                                                                                  7
• Human behavior/activities share some common properties
  with natural languages
     • Meanings are composed from meanings of building blocks
     • Exists an underlying structure (grammar)
     • Expressed as a sequence (time-series)

• Apply Statistical NLP to mobile sensory data

• Information retrieval, machine translation, text
  categorization, summarization, prediction




                                                                8
• Generative language model: P( English sentence) given a
 model
   P(“President Obama has signed the Bill of … ”| Politics ) >>
   P(“President Obama has signed the Bill of … ” | Sports )
   LM reflects the n-gram distribution of the training data:
   domain, genre, topics.
• With labeled behavior text data, we can train a LM for
 each activity type: “walking”-LM, “running”-LM and
 classify the activity as




                                                                  9
• User behavior at time t depends only on the last n-1 behaviors

• Sequence of behaviors can be predicted by n consecutive
 location in the past


• Maximum Likelihood Estimation from training data by counting:



• MLE assign zero probability to unseen n-grams
   Incorporate smoothing function (Katz)
    Discount probability for observed grams
    Reserve probability for unseen grams




                                                                   12
• Convert feature vector series to label streams – dimension reduction

• Using n-gram to model sequence of label stream for each sensory
 dimension – current state and transition captured
• Step window with assigned length


                 A1           A2          A1           A4

                    G2             G5          G2           G2

               W2                  W1                 W2

                    P1          P3      P6           P1


                         A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1

                                                                    13
• Build n-gram models for M users/classes m=1,2,3…M




• Given a behavior text L, we estimate L is generated by the model
 from user m:




• The user classification problem formulated as




                                                                     14
• A binary classification problem: Classifying a user as the owner
 â=1 or not â=-1.
• Given a behavioral n-gram model



• And an observation r, evaluate the probability of a given user is
 the owner â=1 and check if exceeding a threshold θ:


• Given a sequence of behavior text L, and a sensitivity threshold θ,
 validate if L is generated by user m




                                                                      15
0. 8

                                0. 7
Aver age Log Pr obabi l i t y




                                0. 6

                                0. 5

                                0. 4
                                       C             D           A
                                0. 3

                                0. 2
                                       Log Probility                   B
                                       Low Threshold
                                       High Threshold
                                0. 1

                                  0
                                              Sl i di ng W ndow Posi t i on
                                                          i



                                                                              16
Sensing          Preprocessing                                             Modeling

                                                     N-gram
                                                     Model




            Feature                  Behavior Text
          Construction                Generation



                                                                               User




                                                      Classifier
                                                                           Classification




                                                                               User




                                                      Classifier
                                                       Binary
                                                                           Authentication

                         Threshold

                                                                   Inference


                                                                                            17
18
• Accelerometer
   • Used to summarize
     acceleration stream
   • Calculated separately for each
     dimension [x,y,z,m]
   • Meta features:
      Total Time, Window Size

• GPS: location string from Google Map API and mobility path

• WiFi: SSIDs, RSSIs and path

• Applications: Bitmap of well-known applications

• Application Traffic Pattern: TCP UDP traffic pattern vectors: [
 remote host, port, rate ]

                                                                    19
• Offline data collection (for training and testing)
    Pick up the device from a desk
    Unlock the device using the right slide pattern
    Invoke Email app from the "Home Screen"
    Lock the device by pressing the "Power" button
     Put the device back on the desk




                                                       20
21   21
• 71.3% True-Positive Rate with 13.1% False Positive

                                                       22
23
24
25
Quantization              Clustering




        Risk Analysis       Sensor Fusion            Activity
            Tree           and Segmentation        Recognition




                           Certainty of Risk                                    Application Sensitivity



                                                                 <




                                                   Application Access Control



• Experiments to discover anomaly usage with ~80%accuracy with
 only days of training data
                                                                                                          26
• Alpha test in Jun 2012, 1st Google Play Store release in Oct 2012

• False Positive: 13% FPR still annoying users sometimes

• Use adaptive model
   • Adding the trace data shortly before a false positive to the training data and
     update the model

• Change passcode validation to sliding pattern

• A false positive will grant a “free ride” for a configurable duration
   • Assumption: just authenticated user should control the device for a given
     period of time

• “Free Ride” period will end immediately if abrupt context change is
  detected.
• Newer version is scheduled to be release in Jan 2013.


                                                                                      27
• Extended data set for feature construction
   TCP, UDP traffic; sound; ambient lighting; battery status, etc.

• Data and Modeling
   Gain more insights into the data, features and factorized relationships among
   various sensors
   Try other classification methods and compare results: LR, SVM, Random
   Forest, etc

• Enhanced security of SenSec components
   Integration with Android security framework and other applications

• Privacy challenges
   Data collection, model training, privacy policy, etc.

• Energy efficiency


                                                                                   28
Thank you.

Mais conteúdo relacionado

Destaque

First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...JerryDorn
 
A comparative study of literacy and numeracy between public
A comparative study of literacy and numeracy between publicA comparative study of literacy and numeracy between public
A comparative study of literacy and numeracy between publicAlexander Decker
 
Hype task 2015 in pics
Hype task 2015 in picsHype task 2015 in pics
Hype task 2015 in picsStephanie1301
 
2013 2014 schedule
2013 2014 schedule2013 2014 schedule
2013 2014 scheduleackerkri
 
e-learning worldwide
e-learning worldwidee-learning worldwide
e-learning worldwidelaurenball
 
Learing english
Learing englishLearing english
Learing englishseemia
 
Lauren bailey crisp
Lauren bailey crispLauren bailey crisp
Lauren bailey crispRegan Gully
 
Task 13 pitch to bauer media group
Task 13  pitch to bauer media groupTask 13  pitch to bauer media group
Task 13 pitch to bauer media groupcharlie__97
 
OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)
OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)
OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)Niki Acosta
 
11.the modeling and dynamic characteristics of a variable speed wind turbine
11.the modeling and dynamic characteristics of a variable speed wind turbine11.the modeling and dynamic characteristics of a variable speed wind turbine
11.the modeling and dynamic characteristics of a variable speed wind turbineAlexander Decker
 
AceTech companies on the profit 500
AceTech companies on the profit 500AceTech companies on the profit 500
AceTech companies on the profit 500leahcarr
 
7 thoi quen_thanh_dat
7 thoi quen_thanh_dat7 thoi quen_thanh_dat
7 thoi quen_thanh_dattuan success
 
использование Hibernate java persistence.part 4.
использование Hibernate java persistence.part 4.использование Hibernate java persistence.part 4.
использование Hibernate java persistence.part 4.Asya Dudnik
 
Amnistia30urte
Amnistia30urteAmnistia30urte
Amnistia30urteetengabe
 
Red october. detailed malware description
Red october. detailed malware descriptionRed october. detailed malware description
Red october. detailed malware descriptionYury Chemerkin
 
Yuka monitiring 2009_2012
Yuka monitiring 2009_2012Yuka monitiring 2009_2012
Yuka monitiring 2009_2012indrih
 

Destaque (20)

First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition  R2 ...
First Niagara Targets May 18th For Completion Of Hsbc Branch Acquisition R2 ...
 
A comparative study of literacy and numeracy between public
A comparative study of literacy and numeracy between publicA comparative study of literacy and numeracy between public
A comparative study of literacy and numeracy between public
 
Hype task 2015 in pics
Hype task 2015 in picsHype task 2015 in pics
Hype task 2015 in pics
 
2013 2014 schedule
2013 2014 schedule2013 2014 schedule
2013 2014 schedule
 
e-learning worldwide
e-learning worldwidee-learning worldwide
e-learning worldwide
 
Learing english
Learing englishLearing english
Learing english
 
Lauren bailey crisp
Lauren bailey crispLauren bailey crisp
Lauren bailey crisp
 
Task 13 pitch to bauer media group
Task 13  pitch to bauer media groupTask 13  pitch to bauer media group
Task 13 pitch to bauer media group
 
OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)
OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)
OpenStack101: Introductions to Private and Hybrid Clouds (BrightTALK)
 
Safety Equipment
Safety EquipmentSafety Equipment
Safety Equipment
 
11.the modeling and dynamic characteristics of a variable speed wind turbine
11.the modeling and dynamic characteristics of a variable speed wind turbine11.the modeling and dynamic characteristics of a variable speed wind turbine
11.the modeling and dynamic characteristics of a variable speed wind turbine
 
How about now
How about nowHow about now
How about now
 
AceTech companies on the profit 500
AceTech companies on the profit 500AceTech companies on the profit 500
AceTech companies on the profit 500
 
Graficas
GraficasGraficas
Graficas
 
7 thoi quen_thanh_dat
7 thoi quen_thanh_dat7 thoi quen_thanh_dat
7 thoi quen_thanh_dat
 
Scan0137
Scan0137Scan0137
Scan0137
 
использование Hibernate java persistence.part 4.
использование Hibernate java persistence.part 4.использование Hibernate java persistence.part 4.
использование Hibernate java persistence.part 4.
 
Amnistia30urte
Amnistia30urteAmnistia30urte
Amnistia30urte
 
Red october. detailed malware description
Red october. detailed malware descriptionRed october. detailed malware description
Red october. detailed malware description
 
Yuka monitiring 2009_2012
Yuka monitiring 2009_2012Yuka monitiring 2009_2012
Yuka monitiring 2009_2012
 

Semelhante a ICNC 2013 SenSec Presentation

Netflow analyzer- Datasheet
Netflow analyzer- DatasheetNetflow analyzer- Datasheet
Netflow analyzer- DatasheetINSPIRIT BRASIL
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scalexcbsmith
 
Defending Behind the Mobile Device
Defending Behind the Mobile DeviceDefending Behind the Mobile Device
Defending Behind the Mobile DeviceTyler Shields
 
20111104 s4 overview
20111104 s4 overview20111104 s4 overview
20111104 s4 overviewLeo Neumeyer
 
A Context Aware Mobile Social Web
A Context Aware Mobile Social WebA Context Aware Mobile Social Web
A Context Aware Mobile Social Webwasvel
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
Context is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next ScenarioContext is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next ScenarioClark Dodsworth
 
Sense networks
Sense networksSense networks
Sense networksBen Allen
 
Palo alto networks_customer_overview_november2011-short
Palo alto networks_customer_overview_november2011-shortPalo alto networks_customer_overview_november2011-short
Palo alto networks_customer_overview_november2011-shortTen Sistemas e Redes
 
From Context-awareness to Human Behavior Patterns
From Context-awareness to Human Behavior PatternsFrom Context-awareness to Human Behavior Patterns
From Context-awareness to Human Behavior PatternsVille Antila
 
The Essentials of Mobile App Performance Testing and Monitoring
The Essentials of Mobile App Performance Testing and MonitoringThe Essentials of Mobile App Performance Testing and Monitoring
The Essentials of Mobile App Performance Testing and MonitoringCorrelsense
 
RSA 2006 - Visual Security Event Analysis
RSA 2006 - Visual Security Event AnalysisRSA 2006 - Visual Security Event Analysis
RSA 2006 - Visual Security Event AnalysisRaffael Marty
 
2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...
2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...
2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...thsszj
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurancestuartdrose
 
Media, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionMedia, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionXavier Amatriain
 
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...CitoyensCapteurs
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...JISC KeepIt project
 
The Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender SystemsThe Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender SystemsXavier Amatriain
 

Semelhante a ICNC 2013 SenSec Presentation (20)

Netflow analyzer- Datasheet
Netflow analyzer- DatasheetNetflow analyzer- Datasheet
Netflow analyzer- Datasheet
 
The 5 Stages of Scale
The 5 Stages of ScaleThe 5 Stages of Scale
The 5 Stages of Scale
 
Defending Behind the Mobile Device
Defending Behind the Mobile DeviceDefending Behind the Mobile Device
Defending Behind the Mobile Device
 
20111104 s4 overview
20111104 s4 overview20111104 s4 overview
20111104 s4 overview
 
A Context Aware Mobile Social Web
A Context Aware Mobile Social WebA Context Aware Mobile Social Web
A Context Aware Mobile Social Web
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
Context is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next ScenarioContext is King: AR, AI, Salience, and the Constant Next Scenario
Context is King: AR, AI, Salience, and the Constant Next Scenario
 
P47 Eait06
P47 Eait06P47 Eait06
P47 Eait06
 
Sense networks
Sense networksSense networks
Sense networks
 
Palo alto networks_customer_overview_november2011-short
Palo alto networks_customer_overview_november2011-shortPalo alto networks_customer_overview_november2011-short
Palo alto networks_customer_overview_november2011-short
 
From Context-awareness to Human Behavior Patterns
From Context-awareness to Human Behavior PatternsFrom Context-awareness to Human Behavior Patterns
From Context-awareness to Human Behavior Patterns
 
The Essentials of Mobile App Performance Testing and Monitoring
The Essentials of Mobile App Performance Testing and MonitoringThe Essentials of Mobile App Performance Testing and Monitoring
The Essentials of Mobile App Performance Testing and Monitoring
 
RSA 2006 - Visual Security Event Analysis
RSA 2006 - Visual Security Event AnalysisRSA 2006 - Visual Security Event Analysis
RSA 2006 - Visual Security Event Analysis
 
2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...
2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...
2012 kdd-com soc:adaptive transfer of user behaviors over composite social ne...
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurance
 
Media, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste PredictionMedia, data, context... and the Holy Grail of User Taste Prediction
Media, data, context... and the Holy Grail of User Taste Prediction
 
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
La présentation de Jean-Paul de Vooght à la soirée Citoyens Capteurs de la Ca...
 
KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...KeepIt Course 4: Putting storage, format management and preservation planning...
KeepIt Course 4: Putting storage, format management and preservation planning...
 
The Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender SystemsThe Science and the Magic of User Feedback for Recommender Systems
The Science and the Magic of User Feedback for Recommender Systems
 
Usability and Health IT
Usability and Health ITUsability and Health IT
Usability and Health IT
 

ICNC 2013 SenSec Presentation

  • 1. Jiang Zhu, Pang Wu, Xiao Wang, Joy Ying Zhang Carnegie Mellon University January 31st, 2013 1
  • 2. • Monitor and track user behavior on smartphones using various on-device sensors • Convert sensory traces and other context information to Personal Behavior Features • Build continuous n-gram model with these features and use it for calculation of Sureness Scores • Trigger various Authentication Schemes when certain application is launched. 2
  • 3. 3
  • 4. 60% • “The 329 organizations polled had collectively lost 50% more than 86,000 devices … with average cost of lost 40% data at $49,246 per device, worth $2.1 billion or $6.4 30% million per organization. 20% 10% "The Billion Dollar Lost-Laptop Study," 0% conducted by Intel Corporation and the Ponemon Institute, analyzed the scope and circumstances of missing laptop Mobile Device Loss or theft PCs. Strategy One Survey conducted among a U.S. sample of 3017 adults age 18 years older in September 21- 28, 2010, with an oversample in the top 20 cities (based on population). 4
  • 5. Application Password Different applications may have different A major source of sensitivities security vulnerabilities. Easy to guess, reuse, forgotte n, shared Usability Authentication too-often or sometimes too loose 5
  • 6. 6
  • 7. Quantization Clustering Risk Analysis Sensor Fusion Activity Tree and Segmentation Recognition Certainty of Risk Application Sensitivity < Application Access Control Application Access Control 7
  • 8. • Human behavior/activities share some common properties with natural languages • Meanings are composed from meanings of building blocks • Exists an underlying structure (grammar) • Expressed as a sequence (time-series) • Apply Statistical NLP to mobile sensory data • Information retrieval, machine translation, text categorization, summarization, prediction 8
  • 9. • Generative language model: P( English sentence) given a model P(“President Obama has signed the Bill of … ”| Politics ) >> P(“President Obama has signed the Bill of … ” | Sports ) LM reflects the n-gram distribution of the training data: domain, genre, topics. • With labeled behavior text data, we can train a LM for each activity type: “walking”-LM, “running”-LM and classify the activity as 9
  • 10. • User behavior at time t depends only on the last n-1 behaviors • Sequence of behaviors can be predicted by n consecutive location in the past • Maximum Likelihood Estimation from training data by counting: • MLE assign zero probability to unseen n-grams Incorporate smoothing function (Katz) Discount probability for observed grams Reserve probability for unseen grams 12
  • 11. • Convert feature vector series to label streams – dimension reduction • Using n-gram to model sequence of label stream for each sensory dimension – current state and transition captured • Step window with assigned length A1 A2 A1 A4 G2 G5 G2 G2 W2 W1 W2 P1 P3 P6 P1 A2 G2G5 W1 P1P3 A1A4 G2 W1W2 P1 13
  • 12. • Build n-gram models for M users/classes m=1,2,3…M • Given a behavior text L, we estimate L is generated by the model from user m: • The user classification problem formulated as 14
  • 13. • A binary classification problem: Classifying a user as the owner â=1 or not â=-1. • Given a behavioral n-gram model • And an observation r, evaluate the probability of a given user is the owner â=1 and check if exceeding a threshold θ: • Given a sequence of behavior text L, and a sensitivity threshold θ, validate if L is generated by user m 15
  • 14. 0. 8 0. 7 Aver age Log Pr obabi l i t y 0. 6 0. 5 0. 4 C D A 0. 3 0. 2 Log Probility B Low Threshold High Threshold 0. 1 0 Sl i di ng W ndow Posi t i on i 16
  • 15. Sensing Preprocessing Modeling N-gram Model Feature Behavior Text Construction Generation User Classifier Classification User Classifier Binary Authentication Threshold Inference 17
  • 16. 18
  • 17. • Accelerometer • Used to summarize acceleration stream • Calculated separately for each dimension [x,y,z,m] • Meta features: Total Time, Window Size • GPS: location string from Google Map API and mobility path • WiFi: SSIDs, RSSIs and path • Applications: Bitmap of well-known applications • Application Traffic Pattern: TCP UDP traffic pattern vectors: [ remote host, port, rate ] 19
  • 18. • Offline data collection (for training and testing) Pick up the device from a desk Unlock the device using the right slide pattern Invoke Email app from the "Home Screen" Lock the device by pressing the "Power" button Put the device back on the desk 20
  • 19. 21 21
  • 20. • 71.3% True-Positive Rate with 13.1% False Positive 22
  • 21. 23
  • 22. 24
  • 23. 25
  • 24. Quantization Clustering Risk Analysis Sensor Fusion Activity Tree and Segmentation Recognition Certainty of Risk Application Sensitivity < Application Access Control • Experiments to discover anomaly usage with ~80%accuracy with only days of training data 26
  • 25. • Alpha test in Jun 2012, 1st Google Play Store release in Oct 2012 • False Positive: 13% FPR still annoying users sometimes • Use adaptive model • Adding the trace data shortly before a false positive to the training data and update the model • Change passcode validation to sliding pattern • A false positive will grant a “free ride” for a configurable duration • Assumption: just authenticated user should control the device for a given period of time • “Free Ride” period will end immediately if abrupt context change is detected. • Newer version is scheduled to be release in Jan 2013. 27
  • 26. • Extended data set for feature construction TCP, UDP traffic; sound; ambient lighting; battery status, etc. • Data and Modeling Gain more insights into the data, features and factorized relationships among various sensors Try other classification methods and compare results: LR, SVM, Random Forest, etc • Enhanced security of SenSec components Integration with Android security framework and other applications • Privacy challenges Data collection, model training, privacy policy, etc. • Energy efficiency 28

Notas do Editor

  1. As a quick overview of our work, in order to enforce application security, we monitor and track user behavior through the traces collected from the on-device sensors. And then we convert these trace and other context information to behavior features. We adopt a n-gram model to model the user’s behavior and use that to monitor and calculate certainty score. This score will be fed to smartphone’s authentication module to enforce the security of various application and its data on the device.
  2. So what’s motivate us to do this project.
  3. Mobile applications and devices are becoming ubiquitous and will increasingly interact with different sensors, services and other mobile users. It is crucial for mobile users to privately and securely interact with their environment and data and for mobile services to trust the identity of the user. While mobile devices such as smartphones make our lives convenient in ways that were unimaginable before, applications such as email, web browsing, social network, shopping and online banking know too much about our private lives. Mobility introduces additional security and privacy challenges in being able to provide services in a way that neither compromises the environment of users nor their data. Protecting a user&apos;s privacy and ensuring the accountability of mobile applications in a seamless and non-intrusive way poses great challenges to next generation mobile computing platforms.Recently, a new survey* has revealed that 36 percent of consumers in the United States have either lost their mobile phone or had it stolen. Another survey† has also revealed that 329 organizations polled had collectively lost more than 86,000 devices with average cost of lost data at $49,246 per device, worth $2.1 billion or $6.4 million per organization. Given the high loss rate and high cost associated with these losses, accountable schemes are needed to protect the data on the mobile devices.
  4. Reliable authentication is an essential requirement for a mobile device and its applications. Today, passwords are the most common form of authentication. This results in two potential problems. First, passwords are also a major source of security vulnerabilities, as they are often easy to guess, re-used, often forgotten, often shared with others, and are susceptible to social engineering attacks. Secondly, to secure the data and applications on a mobile device, the mobile system would prompt user for authentication quite often and this results in series usability issues. We also observe that different applications on a mobile device may have different sensitivities towards the aforementioned threats and data loss. For example, the Angry Bird game on an android is less sensitive than Contact List or Phone Album should the device is operated by unauthorized user. One-thing-for-all approach in authentication schemes may be either too loose for some applications, which expose them to risks, or too tight for others, which cause usability problems.
  5. The commoditization of sensor technologies coupled with advances in modeling user behavior offer us new opportunities for simplifying and strengthening authentication. We envision a new mobile system framework, SenSec, which uses passive sensory data to ensure application security.
  6. SenSec is constantly collecting sensory data from accelerometer, gyroscope, GPS, WiFi, microphone or even camera. Through analyzing the sensory data, it constructs the context under which the mobile device is used. This includes locations, movements and usage patterns, etc. From the context, the system can calculate the certainty that the system is at risk. Different applications on mobile device are assigned either manually or automatically with a sensitivity value. When user is invoking an application, SenSec compares the certainty with this application’s sensitivity level. If the sensitivity passes the certainty threshold, authentication mechanism would be employed to ensure security policy for that application.
  7. Comparing the play to Shakespeare’s known library of works, Track words and phrases patterns in the data P (unknown play| known Shakespeare&apos;s work) Greater than threshold? Yeah, we found another Shakespeare&apos;s work Or, Plagiarism
  8. So building along this line, we use a continousn-gram model to learn the sequence of locations from user’s wifi traces.N-gram model works under the assumptions that the next location in the sequence .. depends on just the last n-1 locations… Once the n-gram model is trained, we can use it to calculate the probability of all possible next locations given the past n-1 locations…. and see which one is the most likely location.To train the model, we use maximum likelihood estimation on the training sequences to estimate these conditional probability … just by counting. As show in this equation, MLE probability of being in location at time i conditioned on the past n-1 history locations is… just the count of all n sequences in the data divided by the count of all these n-1 sequences. There is one small problem with this approach. Let’s say our model come across a location that has not been seen in the training. It just assumes a zero probability. This may push the system to trigger anomaly alert. Luckily, N-gram model is very robust in handling unseen labels if we use smoothing. Smoothing algorithms such as Katz… are to take some probability mass from the seen locations and reserve them for those unseen locations.
  9. To illustrate this process, let’s take a look at an example.The blue curve is the log probability we just described. Let’s say anomaly happens at point A. If we set the threshold lower like the red line, the system will detect the anomaly at point B with a reasonable delay. But if we set the threshold too high like the pink line, we will mistakenly flag an anomaly for a sequence of normal behavior text…. Which is counted towards false positives at points C and D. The way to find the right threshold for different applications is to use receiver-operating-characteristic curve or ROC curve. We will look at this in more details later in the talk.
  10. So, this complete the whole system architecture. We have the sensing part that produce RSS traces, we have preprocessing part that convert the traces and other context information to behavior text and we have the modeling training and inference part that is used to do anomaly detection with a design parameter “threshold”
  11. Training Stage: Each of the participants uses the phone for 24 hours while the \\emph{SenSec} app is collecting sensory information in the background and build the behavioral $n$-gram model.Positive Testing Stage: In this stage, each participants continue uses her phone for 24 hours. This time the \\emph{SenSec} app is switched to testing mode. It collects the sensors reading the same way as in training mode, but also construct behavior text sequence and feed it to the learned n-gram model. A \\emph{sureness} score is calculated as described in Equation \\ref{eqn_ngram_prob}. If it falls below a preset threshold while certain operation is performed, an authentication screen will be pop up asking user to enter a passcode. The \\emph{sureness} score and the authentication decision are recorded for logging and result reporting purpose.Negative Testing Stage: The phones are given to other participants and let them use it for another 24 hours. As in the previous stage, the same operations are performed on the phone and all authentication events will be record for further analysis. % user 2 negative testing, record labels and results ---- (FN, TP)We examined the logs generated by these experiments. At each authentication decision point, the sureness score is recorded. By varying the threshold, we can evaluate how well our \\emph{SenSec} models perform user authentication under different threshold values. For each threshold value, we can calculate False Positive Rate or FPR and True Positive Rate or TPR and plot Receiver Operating Characteristic (ROC) curves.True Positive Rate (TPR) represents the cases that system successfully detect non-owner access, while False Positive Rate (FPR) represents the false alarms when the owner is detected as a non-owner. To ensure the security of the system while still keeping the usability to a certain level, we need to maximize TPR and minimize FPR. The ROC curves provide a guideline on choosing the right thresholds to fulfill such a requirement. For example, the system may have a requirement to effectively detect 70\\% of the unauthorized access, but it may need to keep the false alarm rate below 15\\%. As shown in Figure \\ref{fig_roc}, we can choose a data point from the allowable region on the ROC curve: threshold 0.7 can achieve 71.3\\% TPR and only 13.1\\% FPR. We also examined the detection delay at this threshold by deliberately passing the phone between the two users more often and recording the time. By comparing these recorded time and the timestamps of triggered anomalies from the log, we found \\emph{SenSec} bears an average 4.96 seconds detection delay.
  12. So now lets see what we conclude from this work and the future work we plan to do
  13. So now lets see what we conclude from this work and the future work we plan to do
  14. That brings me to the end of my presentation. Thank you very much for your attention.