O slideshow foi denunciado.
Utilizamos seu perfil e dados de atividades no LinkedIn para personalizar e exibir anúncios mais relevantes. Altere suas preferências de anúncios quando desejar.

Of Search Lights and Blind Spots: Machine Learning in Cybersecurity

151 visualizações

Publicada em

Talk at the Workshop for Robustness of AI Systems Against Adversarial Attacks 2020 (RAISA3)

https://www.skrasser.com/blog/2020/08/31/adversarial-machine-learning-and-robust-classification/

Publicada em: Dados e análise
  • Seja o primeiro a comentar

  • Seja a primeira pessoa a gostar disto

Of Search Lights and Blind Spots: Machine Learning in Cybersecurity

  1. 1. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. OF SEARCH LIGHTS AND BLIND SPOTS: MACHINE LEARNING IN CYBERSECURITY SVEN KRASSER, CHIEF SCIENTIST, CROWDSTRIKE
  2. 2. WHO? § CrowdStrike § Endpoint protection & breach prevention § Endpoint sensor connecting to Cloud § Processing 3 trillion events per week § My team: Data Science § Malware and threat research § Sandbox and dynamic analysis § Data engineering § Machine Learning research § Machine Learning software development § Hybrid-Analysis.com 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  3. 3. ML IN CYBERSECURITY
  4. 4. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. LONG-TIME USE BEHIND THE SCENES
  5. 5. SOMETHING CHANGED ~2013
  6. 6. MECHANICS & ENGINEERS* * Loosely quoted from an unattributed ML researcher THE DEMOCRATIZATION OF ML
  7. 7. NEW CHALLENGES "ML as panacea" “ML is inherently safe” ML monoculture ML performance is poorly understood
  8. 8. QUANTIFYING THE PROBLEM
  9. 9. PROJECTIONS THROUGH 2022 Source: Gartner (2019) 75%Data governance initiatives not adequately considering AI security risks, resulting in financial loss 30%Cyberattacks leveraging data poisoning, model theft, or adversarial samples 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  10. 10. “DO YOU SECURE YOUR ML SYSTEMS TODAY?" Source: Shankar et al., “Adversarial Machine Learning – Industry Perspectives” (2020) 14%* “Yes” 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.* ⅓ of organizations polled are in the cybersecurity space
  11. 11. STATIC ANALYSIS
  12. 12. WHY TALK ABOUT THIS FIELD TODAY? § Data is plentiful and unencumbered § Challenges translate into other domains § Static analysis, while limited, is a cheap workhorse § Reducing volume of low-effort attacks § Saving compute (and hence dollars) for more complex analysis § Pre-execution detection § Detection on-the-wire (attachment) and at rest (storage)
  13. 13. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. AV Update New M alware 1 Day AV Update DetectionRate
  14. 14. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  15. 15. BASERATE CHALLENGES 125,000 Executables on an average hard disk 20,000 Process executions per day 2017 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  16. 16. 100%TPR@1%FPR
  17. 17. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. HOW THE GAME WAS PLAYED Manual evasions and corresponding countermeasures
  18. 18. 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. Hashbusting Polymorphism Packing Droppers File Infectors/Hiding in Regular Files Wrapped Scripts TRADITIONAL ATTACKER ARSENAL
  19. 19. COUNTERING THE ATTACKER Heuristics Static unpacking Deep format inspection Emulation
  20. 20. ① Adversaries focus on traditional evasions, which stick out to ML ② Adversaries target ML blind spots ③ Adversaries leverage ML for robust evasions The panacea “track”
  21. 21. 0.53 0.28 0.17 0.67 0.56 0.55 0.03 0.04 0.54 0.15 0.56 0.90 0.62 0.97 0.52 0.61 0.82 0.24 0.87 0.36 0.94 0.60 0.53 0.27 0.59 0.63 0.32 0.89 0.91 0.83 0.07 0.57 0.05 0.56 0.95 0.98 0.89 0.24 0.64 0.24 0.45 0.37 0.68 0.25 0.21 0.10 0.52 0.42 0.77 0.11 0.21 0.47 0.05 0.03 0.42 0.96 0.68 0.41 0.96 0.30 0.60 0.50 0.67 0.47 0.80 0.48 0.02 0.53 0.10 0.32 1.00 0.28 0.42 0.31 0.43 0.77 0.11 0.67 0.43 0.31 0.11 0.11 0.70 0.16 0.53 0.58 0.97 0.10 0.83 0.29 0.61 0.31 0.61 0.35 0.03 0.01 0.44 0.77 0.92 0.72 0.26 0.24 0.26 0.03 0.26 0.02 0.35 0.99 0.90 0.03 0.05 0.19 0.27 0.67 0.04 0.48 0.66 0.93 0.04 0.14 0.68 0.69 0.60 0.43 0.12 0.42 0.31 0.74 0.05 0.00 0.98 0.37 0.78 0.46 0.28 0.89 0.01 0.98 0.59 0.75 0.74 0.54 0.63 0.85 0.65 0.22 0.80 0.87 0.82 0.03 0.43 0.91 0.32 0.35 0.21 0.70 0.84 0.36 0.99 0.19 0.92 0.49 0.21 0.50 0.77 0.52 0.60 0.69 0.49 0.38 0.54 0.51 0.07 0.12 0.41 0.40 0.76 0.56 0.20 0.54 0.78 0.61 0.14 0.69 0.39 0.99 0.21 0.90 0.42 0.95 0.09 0.51 0.23 0.22 0.93 0.54 0.00 0.62 0.27 0.98 Problem Space Feature Space Realizable Files
  22. 22. WORKING IN FEATURE SPACE § Choosing a feature space that always produces realizable files § Such as specific binary traits that can be added (but not necessarily removed), e.g. Al-Dujaili et al. (2018) § Imported function names, resources, sections, strings, digital signature, etc. § Similar to how an adversary would attack the model § Use a substitute model with such a feature space to attack a blackbox model § E.g. MalGAN, Hu and Tan (2017) § Create (likely) unrealizable feature vectors with some utility § Not a realizable attack but allows better preparing for one § Increasing robustness at training time § Creating pseudo variants for test time (“new family” scenario)
  23. 23. WORKING IN PROBLEM SPACE A look at both realizable and real-world attacks 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  24. 24. Ashkenazy and Zini (2019) “CHAFF” ATTACK § Attack on a security vendor production model deployed on endpoints § Unconstrained sparse string-based features § “This string exists somewhere in the file” § Likely heavily weighted § Non-monotonic model § Extracting strings from files from the product’s whitelist § How to toggle the corresponding features? § Add the string somewhere § Appending to the end of a Portable Executable (the “overlay”) generally keeps the executable working § à All realizable § Bypass achieved 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  25. 25. Winning Offensive Solution – Fleshman (2019) ML STATIC EVASION COMPETITION § Modify malware to bypass 3 non-production research models § MalConv (DNN, raw bytes) § Non-negative MalConv § EMBER (engineered features and LightGBM; Anderson and Roth, 2018) § Modified files are verified in a sandbox environment § DNN models have only unconstrained features (data anywhere can nudge) § EMBER has some unconstrained features § Byte entropy histogram (continuous features) § Strings § Data injected in various areas § Overlay § New sections § Empty space at end of sections (alignment) § Bypass achieved 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  26. 26. Anderson et al. (2018) LEARNING TO EVADE § Reinforcement Learning approach to pick the best sequence of modifications to achieve evasiveness § Action space § Modest evasiveness achieved (but no manual intervention as in previous two approaches) 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. Add import Change section names Create section Appending data to sections New EP that jumps to original EP Removing signer info Changing debug info Packing Unpacking Breaking header checksum Add to overlay Etc.
  27. 27. Elkind (2019) MITIGATING THROUGH REGULARIZATION § Premise § We know of several perturbation techniques resulting in realizable attacks § We want the model to ignore such modifications without constraining the feature space and reducing expressiveness § Pairwise Hidden Regularization § Penalize differences in hidden representations ℎ() in DNN between original file 𝑥 and perturbed file %𝑥 § min 𝐿𝑜𝑠𝑠 𝜃 + 𝜆 ℎ 𝑥, 𝜃 − ℎ(%𝑥, 𝜃) ! § Training on perturbed pairs § Notionally, perturbed files have a modified overlay (appended data) § Other modifications can be implemented accordingly (e.g. adding sections) § Models more robust; evasions more expensive 2020 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  28. 28. CONCLUSIONS Educating decision makers about ML Off-the-shelf guardrails; best practices for safety Cost reduction for the adversary; means to increase it again Opportunity for defenders to achieve higher levels of robustness Detectability; avoid silent failure
  29. 29. sven@crowdstrike.com @SvenKrasser

×