O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

In most security data science talks that describe a specific algorithm used to solve a security problem, the audience is left wondering: how did they perform system testing when there is no labeled attack data; what metrics do they monitor; and what do these systems actually look like in production? Academia and industry both focus largely on security detection, but the emphasis is almost always on the algorithmic machinery powering the systems. Prior art productizing solutions is sparse: it has been studied from a machine-learning angle or from a security angle but has not been jointly explored. But the intersection of operationalizing security and machine-learning solutions is important not only because security data science solutions inherit complexities from both fields but also because each has unique challenges—for instance, compliance restrictions that dictate data cannot be exported from specific geographic locations (a security constraint) have a downstream effect on model design, deployment, evaluation, and management strategies (a data science constraint). This talk explores this intersection!

  • Entre para ver os comentários

Operationalizing security data science for the cloud: Challenges, solutions, and trade-offs

  1. 1. Choosing the Learner Binary Classification Regression Multiclass Classification Unsupervised Ranking Anomaly Detection Collaborative Filtering Sequence Prediction Reinforcement Learning Representation Learning
  2. 2. Choosing the Learning Task •Binary Classification •Anomaly Detector •Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) •Data type Applying Data Transforms •Cleaning Missing data •Dealing with categorical data •Dealing with text data •Data Normalization Choosing the Learner •Binary Classification •Regression •Multi class •Unsupervised •Ranking •Anomaly Detection •Collaborative Filtering •Sequence Prediction Choosing Output •Save Features of a model? •Save the model as text? •Save Model as binary? •Save the per-instance results? Choosing Run Options •Run Locally? •Run distributed on HPC cluster? •Are all paths in the experiment node-accessible? •Priority? •Max Concurrent Process? View Results •Too large? •Sampled •Right size •Load data •Histogram •Per feature •Sampled Instances Debug and Visualize Errors •Error in Data •Error in Learner •Error in Optimizer •Error in Experimentation setup Analyze Model Predictions •Root cause analysis •Grading
  3. 3. Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions Choosing the Learning Task • Binary Classification • Anomaly Detector • Ranking Defining Data Input • Data Loaders (text, binary, SVM light, Transpose loader) • Data type Applying Data Transforms • Cleaning Missing data • Dealing with categorical data • Dealing with text data • Data Normalization Choosing the Learner • Binary Classification • Regression • Multi class • Unsupervised • Ranking • Anomaly Detection • Collaborative Filtering • Sequence Prediction Choosing Output • Save Features of a model? • Save the model as text? • Save Model as binary? • Save the per-instance results? Choosing Run Options • Run Locally? • Run distributed on HPC cluster? • Are all paths in the experiment node-accessible? • Priority? • Max Concurrent Process? View Results Debug and Visualize Errors Analyze Model Predictions
  4. 4. Operationalizing Security Data Science Ram Shankar Siva Kumar (@ram_ssk) Andrew Wicker Microsoft
  5. 5. Security Data Science Projects are different • Traditional Programming Projects: spec/prototype → implement → ship • Data Science Projects: at each stage: relabel, refeaturize, retrain • With data-driven features, all components drift: • Learner: more accurate/faster/lower-memory-footprint/… • Features: there are always better ones • Data: all distributions drift • Security Projects: at each stage: assess threat, build detections, respond • All components drift: • Threat: new attacks constantly come out; • Detection: newer log sources • Response: better tooling, newer TSGs Intro Model Evaluation Model Deployment Model Scale-out Conclusion So wait…when do we ship??
  6. 6. You ship when your solution is operational Security Experts Engineers Legal Service Engineers Product Managers Machine Learning Experts Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  7. 7. Operational is more than your “model is working”… Detect unusual user activity to prevent data exfiltration Detect unusual user activity using Application logs, with false positive rate < 1%, for all Azure customers, in near real-time Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  8. 8. Detect unusual user activity using Application logs, with false positive rate < 1%, for all Azure Customers in near real-time => The Problem => Data => Model Evaluation => Model Deployment => Model Scale-out Operationalize Security Data Science: Components Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  9. 9. Model Evaluation How do you know your system works?
  10. 10. Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  11. 11. Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  12. 12. Model Evaluation Metrics Model Usage Metrics Model Validation Metrics • E.g: False Positive • Makes your customer (and ergo, your business) happy • How to measure this? • E.g: Call Rate • How much is the model in use? • Makes your division happy • Collected by your pipeline after deployment • E.g: MSE, Reconstruction error…. • How well does the model generalize? • Makes the data scientist happy • Comes pre-built with ML framework (Scikit learn, CNTK) Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  13. 13. Model Evaluation: How to gather Evaluation dataset? • Good: Use Benchmark datasets • List of curated datasets - www.secrepo.com • Con: Remember – attackers have ‘em too! • Better: Use previous Indicators of Compromise • Honeypots, commercial IOC feeds • Steps: • Gather confirmed IOCs • “Backprop” them through the generated alerts • This will help you calculate FP and FN • Best: Curate your own dataset MoreSpecialized Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  14. 14. Curating your own dataset options 1. Inject Fake Malicious data Model Synthetic data Storage How: Label data as “eviluser” and check if “eviluser” pops to the top of the reports every day Pro: Low overhead—you don’t have to depend on a red team to test your detection Con: The injected data may not be representative of true attacker activity Storage Alerting System C Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  15. 15. Curating your own dataset options 2. Employ Commonly Used Attacker Tools How: Spin up a malicious process using Metasploit, Powersploit, or Veil in your environment. Look for traces in your logs Pro: Easy to implement; your development team, with little tutorial, can run the tool, which would generate attack data in the logs. Con: The machine learning system, will only learn to detect known attacker toolkits and not generalize over the attack methodology Model Storage Tainted Data Alerting System Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  16. 16. Curating your own dataset options 3. Red Team pentests your environment How: a red team attacks the system and we try to get the logs from the attacks, as tainted data Pro: Closest technique to real-world attacks Con: Red Teams are point in time exercises; expensive Model Storage Tainted Data Alerting System Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  17. 17. Growing your dataset: Generative Adversarial Networks Source: https://medium.com/@devnag/generative-adversarial-networks-gans-in-50-lines- of-code-pytorch-e81b79659e3f#.djcfc6eo0 Source: http://www.evolvingai.org/ppgn Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  18. 18. Model Deployment Tailoring alerts based on customers geographic location
  19. 19. Azure has data centers all around the world! Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  20. 20. Localization affects Model Building • Privacy Laws vary across the board • IP address is treated as EII in some regions vs. not EII in other regions • “Anyone logging into corporate network at midnight during the weekend is anomalous” • Weekend in Middle East != Weekend in Americas • Seasonality varies Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  21. 21. Option 1: Shotgun Deployment • How: Deploy same model code across different regions • Pros: • Easy deployment; • Uniform metrics • Single TSG to debug all service incidents • Cons: • Lose macro trends in favor of micro trends • Model-Region Incompatibility Region 1 Region 2 Region 3 Model ModelModel Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  22. 22. Option 2: Tiered Modeling • How: • Federated Models • Each region is modeled separately • Results are scrubbed according to compliance laws and privacy agreements • Scrubbed results are used as input to “Model Prime” • Model Prime • Results are collated to search for global trends • Pros: • Bespoke modeling for every region • Balance between Micro and Macro modeling • Cons: • Complicated Deployment • Depending on the agreements, model-prime may not be possible Region1 Region2 Region3 Model 1 Model - Prime Model 2 Model 3 Scrubbed Results Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  23. 23. Model Scale-Out A Case Study
  24. 24. Detecting Malicious Activities Detect risky or malicious activity in SharePoint Online activity logs with precision > 90% for all SPO users in near real-time => The Problem => Data => Model Evaluation => Model Deployment => Model Scale-out Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  25. 25. Exploratory Analysis • Typical data science work: • Sample data • Script for preprocessing data • Summary statistics • Script for evaluating approaches • All done locally on dev machine using R/Python • Facilitates quick turn around • Avoids having to debug at scale Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  26. 26. Model Evaluation • Labels from known incidents and investigations • Inject labels by mimicking malicious activity • SPO team helps us understand the malicious activity • Red team helps us simulate the malicious activity • > 90% precision Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  27. 27. Model: Bayesian Network • Probabilistic Graphical Model • Related to GMM, CRF, MRF • Represents variables and conditional independence assertions in a directed acyclic graph • Directed edges encode conditional dependencies • Conditional probability distributions for each variable Burglary Alarm Mary Calls John Calls Earthquake Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  28. 28. Initial Prototype – v0.1 • One activity model for all users • Run model in cloud environment with Azure Worker Role • Storage accounts for input data and output scores • Pros: • Easy to manage • Small memory footprint • Cons: • Does not scale • Low throughput Data Scores Azure Worker Role Activity Model User 1 User 2 User 3 Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  29. 29. Improved Approach • One model for each user • Personalized activity suspiciousness • Cluster low-activity users for better model results • Replace storage accounts with Azure Event Hubs • Low-latency, cloud-scale “queues” Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  30. 30. Model Scale-Out: Memory Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Model Storage • Millions of per-user models • More than can fit in worker role memory • Store models in storage account • Load as needed Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  31. 31. Model Scale-Out: Latency Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Model Storage Redis Cache • Model storage account adds too much latency • Redis cache minimizes model loading latency • LRU policy as we process user activity events Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  32. 32. Data Compliance • Models can not use certain PII • Balkanized cloud environments • Tiered model development • Resolve user information for UX • UserID -> User Name Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  33. 33. Data Compliance Azure Worker Role User 1 User 2 User 3 Event Hub Event Hub Model 1 Model 2 Model 3 Model n … Scores Model Storage Redis Cache User Account DB Redis Cache Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  34. 34. Cloud Resource Competition Signal 1 Signal 2 Signal 3 Signal m User Account DB Redis Cache Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  35. 35. Cloud Resource Competition Signal 1 Signal 2 Signal 3 Signal m User Account DB Redis Cache Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  36. 36. From v0.1 to v1.0 Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  37. 37. Conclusion
  38. 38. Operationalize Security Data Science: Components => Model Evaluation => Model Deployment => Model Scale-out Intro Model Evaluation Model Deployment Model Scale-out Conclusion
  39. 39. The Rand Test Test to see if your Security Data Science solution operational Answer Yes/No to the following: 1) Do you have an established pipeline to collect relevant security data? 2) Do you have established SLAs/data contracts with partner teams? 3) Can you seamlessly update the model with new features and re-train? 4) Did you evaluate the model with real attack data? 5) Does your model respect different privacy laws, across all regions? 6) Do you account for model localization? 7) Is your model scalable, end to end? 8) Do you hold live site meetings about your solution? 9) Can security responders leverage the model for insights during an investigation? 10) Do you have a framework to collect feedback from security analysts/feedback on the results? By @ram_ssk, Andrew Wicker Score - Yes = 1 point 10 5 0 All systems Operational! Houston! We have a problem One small step… Model Evaluation Model Deployment Model Scale-out Intro Model Evaluation Model Deployment Model Scale-out Conclusion

×