SlideShare uma empresa Scribd logo
1 de 90
Mining Software Vulns
in SCCM / NIST’s NVD
THE ROCKY ROAD TO DATA NIRVANA
Overview
• “Pleased to meet you”
• The Playground
• Challenge #1: Complex Data Structures
• Challenge #2: “Dirty” unstructured data
• Challenge #3: People issues
• Lessons learned + Demo
Who I am
• Technical Security Architect at Ubisoft
• Previous: 2 large financial institutions, a major retailer, a
world-class telco, service bureaus
• Generalist with a passion for all things “technical security”
Disclaimer
“Opinions expressed as well
as the content of this
presentation are the
responsibility of the author.
They do not represent Ubisoft
company policy or views.”
The Playground: “Find the panda”
• 10K+ team members
• 26 studios in 18 countries
• Windows-centric
• Creativity Rules!
Where is the vulnerable
non-Microsoft software
installed?
The Great Idea
Microsoft’s SCCM: Reliable production software inventory
NIST’s NVD database: Up-to-date vulnerability data
Effective Patch Management
The Great Idea: Why?
• Avoids expen$ive licensing by using free public software
• Vuln data can become a JSON feed into SIEM or DFIR “big
data” mining app
• Do the “impossible” with leading-edge technologies
Challenge # 1
COMPLEX DATA STRUCTURES
MS’ System Center Configuration Mgr
• “The application people love to hate”
• Indispensable for management of enterprise-scale
Windows-centric environment
• Back-end MS-Sql database: 1600+ tables, 6200+ views
• Distributed component design leveragingWMI
• On-premises deployment: complex architecture
SCCM Components
• 50+ components!!!
• DLLs running (mostly) as threads, also
some separate services
• Communication:
• In-core queues
• Flat files stored in inboxes / outboxes
SCCM and WMI
SMS was the original WMI client
“Everything” is architected using WMI:
• Client-side
• Internal control of agent operations
• Discovery of hardware inventory
• Server-side
• SMS Provider isWMI provider
• Exposes important database objects asWMI objects
• ConfigMgr Console, SCCM auxiliary applications and tools are
implemented asWMI Mgmt Applications.
SCCM Discovery - I
• Populates inventory data in SCCM database
• 6 different methods
• Which are enabled depends on site configuration
• 4 methods target AD
• 1 searches the surrounding network
• 1 interacts with the SCCM client
SCCM Discovery - II
• AD Forest Discovery: IP subnets, AD sites
• AD Group discovery: AD groups and memberships
• AD User discovery: User accounts,AD attributes
• AD System discovery: Computer discovery
• Heartbeat discovery:
• Enabled by default + must be enabled  Are clients healthy and reachable?
• “creates discovery data records (DDRs) containing information about the client
including network location, NetBIOS name, and operational status.”
• Every 7 days by default.
• Network discovery: Search domains, SNMP services, Dhcp servers.
Disabled by default.
SCCM Discovery - III
“Garbage In –
Garbage Out”
SCCM Discovery - IV
“Make friends with your
SCCM administrator”
• Methods enabled?
• Polling interval?
SCCM Data – “Getting to know you”
“Hands-on” Exploring
• MS Sql Studio
Use AD to augment host inventory data
• E.g. OU in Distinguished Name
“Google isYour Friend”
• Also SafariTechnical Library
SCCM Data - I
UseViews notTables
• More stable interface
• Better documentation
• Permissions already in place
• Performance – avoid locking tables
• MS has done the “heavy lifting” e.g.
joins, stored procedure definitions
• More Community experience
• This is what MS MVPs say to do
Query SQL notWMI
• More direct, simpler, better performance
SCCM Data II – WMI Underpinnings
• WMI Class Name: “SMS_xxx”  SQLView Name: v_xxx
• WMI Property Names  Column names in SQLViews
• View names > 30 chars are truncated
• Column names have “0” appended to avoid conflicts with
SQL reserved words
SCCM Data III – View types
• Inventory data:
• Current: v_GS_< group name >
• History: v_HS_< group name >
• Discovery data:
• WMI scalar properties: v_R_< resource type name >
• WMI array properties: v_RA_< architecture name >_< group name >
SCCM Data III –
View types
v_SchemaViews lists
and categorizes
ConfigMgr views
SCCM Data IV – Inventory groups / views
• v_GroupMap view lists inventory groups and views
• Each one represents a WMI class configured for
inventory collection in client agent settings
DisplayName InvClassName InvHistoryClassName MIFClass
System v_GS_System v_HS_System SYSTEM
Add Remove Pgms
v_GS_ADD_REMOVE_PROG
RAMS
v_HS_ADD_REMOVE_PROGR
AMS
MICROSOFT|ADD_REM
OVE_PROGRAMS|1.0
Memory v_GS_X86_PC_MEMORY v_HS_X86_PC_MEMORY
MICROSOFT|X86_PC_M
EMORY|1.0
SCCM Data V - Collections
• A Collection is “a logical
group of resources in
ConfigMgr”
• v_Collection view:
Collection meta-data
• “All…” columns –
system-wide collections
Name Members
All Systems 25106
All Users 22903
All Unknown Computers 8
AllWindows Clients 20630
AllWindows Servers 3610
SCCM Data VI – Which view to use?
• v_R_System
• FromAD / Network / Heartbeat Discovery
• Resource_ID
• NetBIOS name, OS, AD domain
• 60+ fields
• v_GS_System
• Updated when Hardware Inventory runs
• Less accurate – host must have active agent and be scheduled for
hdware inventory
• 10 fields
SCCM Data : TL;DR
In most production contexts, the relevant views are:
• v_R_System
• Host / user data
• v_GS_ADD_REMOVE_PROGRAMS
• v_GS_ADD_REMOVE_PROGRAMS_64
• Updated when Hardware Inventory runs
• Installed software registry data
NIST Data
• Two main NIST data sets:
• CPE:Vendor / product dictionary
• CVE: List of vulnerabilities by year
• Formalized, structured format (== XML)
NIST’s CPE
CPE == “Common Platform Enumeration”
“Common Platform Enumeration (CPE) is a standardized method of
describing and identifying classes of applications, operating systems, and
hardware devices present among an enterprise's computing assets.”
 A master list of all vendors and all their products.
CPE Data - Header
CPE Vendor / Product dictionary
A typical item in the CPEVendor / Product dictionary:
CPE Vendor / Product dictionary con’t
CPE Data – Vendor / Product
CPE Data – Vendor / Product
CPE Data – Vendor / Product
CPE Data – Vendor / Product
CPE Data – Vendor / Product
CPE Data – Vendor / Product
CPE Data – Vendor / Product
CPE Data – Vendor / Product
CPE Data – Vendor / Product
NIST’s NVD
“The NationalVulnerability Database is the U.S. government
repository of standards-based vulnerability management data
…This data enables automation of vulnerability management,
security measurement, and compliance.” (Wikipedia)
NIST NVD Components
A typical NIST NVD entry has the following components:
Component Name Description
CVE
CommonVulnerabilities and
Exposures
The basic vulnerability listing includingCPE vendor /
product.
CVSS
CommonVulnerability Scoring
System
Standardized vulnerability impact
CWE Common Weakness Enumeration Augmented, standardized description of vulnerability
CVE – Vulnerability
A typical vulnerability entry in the NVD: CVE-2017-3547
CVE – Vulnerability
CVE – Vulnerability
CVE – Vulnerability
CVE – Vulnerability
CVE – Vulnerability
CVE – Vulnerability
CVE – Vulnerability
CVE – Vulnerability
NIST NVD Feeds
NVD CVE data available as a daily Feed:
• XML or (new) JSON format
• Compressed gzip or zip archive
• Delta file or full download by year
• Meta file with file sizes / SHA256 hash to determine if feed file has
changed
https://nvd.nist.gov/vuln/data-feeds
NIST Data : TL;DR
• CPE:Vendor / product dictionary
• CVE: List of vulnerabilities by year
• CVSS:Vuln Impact (contained in CVE)
• XML standardized format
• Daily feeds available
Complex Data: The solution
The challenge:
• How to extract the unstructured vendor registry data from SCCM?
• How to match this data with the NIST vulnerability data?
The solution:
• Wise choice ofTools
• “Divide and conquer”
Make Good Technology choices
python: Good “data science” language
• fuzzywuzzy: Fuzzy matching
• xmltodict: XML parsing
pandas: Data will fit in computer memory. Great python-
based data analysis tool.
scikit-learn: Reliable Artificial Intelligence / Machine Learning
algorithms
Docker: Move “skunkworks” project around as required
ansible: Automate provisioning
Basic Approach
• Keep it native
• UseWindows to talk toWindows (AD, SCCM)
• Use Linux for Docker / python / pandas / scikit-learn
• Keep it simple
• 3rd-party software only, not Microsoft
• “Divide and conquer”
• Match vendors first
• Then match products for a given vendor
Basic Approach con’t
Use Machine Learning
• Treat this as two separate classification problems.
• Manually label data (especially vendors) since data sets are
small
• Extract features from data using fuzzy matching
Sample Vendor Data – Potential Matches
SCCM CPE
The GnuPG Project gnupg
DigitalVolcano Software Ltd digitalvolcano
NETGEAR Powerline netgear
MIT Media Lab mit
Cisco Systems, Inc. cisco
DameWare Development, LLC. dameware
BumpTechnologies, Inc. bump_project
Open Source open_source_development_team
Sample Vendor Data – SCCM Vendor names
Will the real vendor please stand up?
Cisco Oracle
Cisco Consumer Products LLC Oracle
Cisco Systems Oracle and/or its affiliates
Cisco Systems, Inc Oracle Corporation
Cisco Systems, Inc. Oracle Corporation.
CiscoWebEx LLC Oracle USA
Oracle, Inc.
ML – Feature Extraction
ML Classification Algorithm needs data “features”
Basic approach:
• Tokenization
• Stop words
• Fuzzy matching statistics
• String length
ML – Tokenization
• Convert name string into a set of tokens:
• Shift to lower case
• Split string into tokens using separators: _ . , ( ) + !
• Remove “Stop” words
• Tokens that appear often e.g. “Ltd.” “Inc.” “Project” “Software”
• Add little “value” in determining whether there is a match
ML – Fuzzy Matching I
Levenshtein or “edit” distance:
“The Levenshtein distance between two words is the minimum number
of single-character edits (insertions, deletions or substitutions) required
to change one word into the other.” (Wikipedia)
ML – Fuzzy Matching II
python FuzzyWuzzy package
https://github.com/seatgeek/fuzzywuzzy
1st string 2cd string Ratio
Simple Ratio "this is a test" "this is a test!" 97
Partial Ratio "this is a test" "this is a test!" 100
Token Sort Ratio
"fuzzy wuzzy was a
bear"
"wuzzy fuzzy was a
bear"
100
Token Set Ratio "fuzzy was a bear"
"fuzzy fuzzy was a
bear"
100
ML – Feature Extraction
To extract data “features”:
• Use the fuzzywuzzy pkg to calculate match ratios
• Also use string length
ML – Label the input data sets
Observations:
• Accurately matching vendor data is crucial
• Data set size is small: ~10K vendors
Approach:
• Manually label data taking care to target important vendors
• Use the manually labelled data to train the ML algorithm
• Use ML-classified data + labelled data for final match processing!!
ML – Algorithm Selection I
Which algorithm to
choose?
ML – Algorithm Selection II
Use simple K-Folds cross-validation
• Split labelled data into k consecutive folds
• Each fold is used once for validation while remaining k – 1 folds
form the training set
• Repeat for each algorithm being tested
ML – Algorithm Selection III
Random Forest Classifier was the best.
• “Forest” of decision trees
• Diverse set of classifiers built by introducing randomness in
classifier construction
• Prediction of the ensemble is the averaged prediction of
the individual classifiers.
http://scikit-
learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
ML – Algorithm Tuning
“This algorithm has many parameters. How to tune for
maximum accuracy?”
Use Randomized Grid Search with Cross-Validation
• Define initial parameter bounds / possible values
• Randomized search over the parameter space
• Use cross-validation to evaluate estimator accuracy
ML – Software match sample results
Just how good is the matching?
CPE SCCM DisplayName0
cpe:/a:wireshark:wireshark:1.4.3 Wireshark 1.4.3
cpe:/a:videolan:vlc_media_player:1.1.6 VLC media player 1.1.6
cpe:/a:hp:headless_server_registry_update:1.0.0.0 Headless Server Registry Update
cpe:/a:hp:insight_management_agents:8.70.0.0 HP Insight Management Agents
cpe:/a:wireshark:wireshark:1.12.6 Wireshark 1.12.6 (64-bit)
cpe:/a:adobe:indesign_cs4_common_base_files:6.0 Adobe InDesignCS4Application Feature Set Fil..
cpe:/a:hp:smart_web_printing:4.60 HP SmartWeb Printing 4.60
cpe:/a:mozilla:firefox:45.0.1 Mozilla Firefox 45.0.1 (x64 en-US)
cpe:/a:watchguard:watchguard_system_manager:- WatchGuard System Manager 11.5.1
Complex Data : TL;DR
• Choose powerful technology: python / pandas / scikit-learn
• Split into 2 separate simple classification problems
• K-Folds Cross-validation picked Random Forest Classifier
• Randomized Grid Search with Cross-validation to tune
Challenge # 2
“DIRTY” DATA
Then Everything Blew Up!
Discovery: Real-life production data is full of anomalies!
• AD
• 80K extraneous hosts
• SCCM
• Did not manage “everything”
• Some hosts were “missing in action” e.g. laptops
• CPE
• Vendor product naming / versioning varied wildly from vendor to
vendor
• Vendor buyouts / merges impacted product naming e.g. Java
• Foreign language data / Unicode
“Dirty” data solutions I
• Spend hands-on time with the data
• Manual labelling  several code rewrites
• Use Defensive Coding
• Validate all input
• Use python “try”
• Handle Missing data
• The “bane” of pandas
• Either discard or initialize to a known value
“Dirty” Data solutions - II
Discard extraneous data as quickly as possible, e.g.:
• Microsoft software data
• Deprecated NVD data
• Unmanaged SCCM hosts
• CVE listings for hardware / OS vulnerabilities
“Dirty” Data Solutions - III
Use heuristics to speed up matching
• Vendor:
• Ignore CPE vendors that are 1-2 characters long
• 1st word of CPEVendor string has to be in the tokenizedWMI SCCM
Publisher0 string somewhere
• The condensed CPE name has to be shorter than the fullWMI
"Publisher0“
• Products:
• Release #’s should at least partially match
• At least one word in the CPE product name should be found in the
SCCM equivalent
“Dirty” Data Solutions – IV
When all else fails, develop code for the “problem” data
e.g. Java product versioning
“Dirty” Data Solutions: TL;DR
• Get “intimate” with the data
• “Shields up”: validate, “try”
• “Shoot from the hip”: Kill the “missing” data before it gets
you
• “Take out the garbage” (data)
• Cheat if you have to: Heuristics
• “Plan B”: code around obstacles
Challenge # 3
PEOPLE ISSUES
Present the idea to Ops to get support
• Took my “great idea”
to the SCCM
Production Ops team
• They were kind enough
to meet with me.
• On-site meeting with
SCCM architect on
conference call.
Production Ops reaction: Oups! Disaster!
• Talked “technology”
instead of presenting
from Ops viewpoint
• SCCM architect
• “The” key player
• 6 time zones away, end of
his day
• Local meeting was not in
his native language
• The “man in the wall”
Blessed by the King! (… Sort of)
• VP came to town
• Heard the prez
• Wanted “his” dashboard:
• For “yesterday”
• Budget: $0 / 0 hr
Ops reaction: We are Worried!!!!
• Ops people rapidly
became concerned
about visibility ofVP
Dashboard
• Started making noises
about “SCCM DB
Performance”
• Totally understandable
reaction
Ops Proposition: “Take our nice siding here”
• Instead of direct
production access, use
a secondary non-prod
DB employed for
reporting / query
• Turned out that this
DB underwent
arbitrary “black box”
ETL of SCCM data
depending on Ops
reporting needs,
visibility req’t!
“People” Solutions I
• Operate in “pirate” mode: Budget of 0 hr $0 means:
• Run under the radar
• Be focused and efficient – refactor prototype code into prod-ready batch
classes
• Be flexible, be creative:
• Docker-based project bounced from Ubuntu toWindows to CentOS to
save $
• Run on lab PCs, on scrapped PCs, on laptops, anything that is available
• Make deals
• “Sell your grandmother to the highest bidder” to get that precious direct
production access
“People” Solutions II
• Deliver quietly, slowly, and “down-sell” to ease viz concerns
• “Uh MrVP, your dashboard is not quite ready yet …”
• “This is a new app and new technology. Data reliability is still to be
proven …”
• Provide targeted Ops training
• Help the “dump truck” people understand the new-fangled
“airplane” paradigm
• Give Ops control and help them find ways to leverage the new
technology
Lessons learned
“What I didn’t do but should have”
• Data wrangling requires time and effort to do well
• Set management and user expectations at the outset
• Think “big”, think “production” to start
• “Take baby steps”: Always runnable continuous
development
• Write test cases before writing code
• Write code in small reusable modules with clean
interfaces
• Document and delegate
“People” Solutions: TL;DR
• Operate in “pirate” mode
• Be flexible, be creative
• When necessary:
• “Sell your grandmother to the highest bidder”
• Deliver quietly, slowly, and “down-sell”
• Provide targeted Ops training
• “Lessons learned”
Vulnmine
DemoTime
Contact Information
Github and Docker Hub: lorgor/vulnmine
Peerlyst: lorgor77
Loren Gordon
Email: lgordon - - at - - lgsec.biz
Twitter: --at-- lorgorsec
Web: lorgor@blogspot.ca
References
System Center 2012 Configuration Manager Unleashed
Meyler et al, 2012 Sams, ISBN-13: 978-0-672-33437-5
MSTN Sys Center 2012 Config Mgr SQLView Schema
https://technet.microsoft.com/en-ca/library/dn581978.aspx
MSTN Gallery: SCCM CfgMgr 1602 SQLViews
Documentation
https://gallery.technet.microsoft.com/SCCM-Configmgr-1602-SQL-
8db3b11c
Shouts to Pixabay for their free images
https://pixabay.com

Mais conteúdo relacionado

Semelhante a Mining software vulns in SCCM / NIST's NVD

Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
Denim Group
 

Semelhante a Mining software vulns in SCCM / NIST's NVD (20)

The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability Management
 
The How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability ManagementThe How and Why of Container Vulnerability Management
The How and Why of Container Vulnerability Management
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Get Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber SolutionGet Started with Cloudera’s Cyber Solution
Get Started with Cloudera’s Cyber Solution
 
Scaling and Managing Big Data Apps in the Cloud
Scaling and Managing Big Data Apps in the CloudScaling and Managing Big Data Apps in the Cloud
Scaling and Managing Big Data Apps in the Cloud
 
Vulnerability Intelligence and Assessment with vulners.com
Vulnerability Intelligence and Assessment with vulners.comVulnerability Intelligence and Assessment with vulners.com
Vulnerability Intelligence and Assessment with vulners.com
 
Webinar: Securing your data - Mitigating the risks with MongoDB
Webinar: Securing your data - Mitigating the risks with MongoDBWebinar: Securing your data - Mitigating the risks with MongoDB
Webinar: Securing your data - Mitigating the risks with MongoDB
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data InitiativeBig Data Fabric: A Necessity For Any Successful Big Data Initiative
Big Data Fabric: A Necessity For Any Successful Big Data Initiative
 
Building Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSABuilding Your Application Security Data Hub - OWASP AppSecUSA
Building Your Application Security Data Hub - OWASP AppSecUSA
 
Black friday logs - Scaling Elasticsearch
Black friday logs - Scaling ElasticsearchBlack friday logs - Scaling Elasticsearch
Black friday logs - Scaling Elasticsearch
 
DockerCon EU 2015: Monitoring Docker
DockerCon EU 2015: Monitoring DockerDockerCon EU 2015: Monitoring Docker
DockerCon EU 2015: Monitoring Docker
 
VMworld 2013: Introducing NSX Service Composer: The New Consumption Model for...
VMworld 2013: Introducing NSX Service Composer: The New Consumption Model for...VMworld 2013: Introducing NSX Service Composer: The New Consumption Model for...
VMworld 2013: Introducing NSX Service Composer: The New Consumption Model for...
 
(SEC310) Keeping Developers and Auditors Happy in the Cloud
(SEC310) Keeping Developers and Auditors Happy in the Cloud(SEC310) Keeping Developers and Auditors Happy in the Cloud
(SEC310) Keeping Developers and Auditors Happy in the Cloud
 
Enterprise Trends for MongoDB as a Service
Enterprise Trends for MongoDB as a ServiceEnterprise Trends for MongoDB as a Service
Enterprise Trends for MongoDB as a Service
 
Get started with Cloudera's cyber solution
Get started with Cloudera's cyber solutionGet started with Cloudera's cyber solution
Get started with Cloudera's cyber solution
 
How to Monitor Microservices
How to Monitor MicroservicesHow to Monitor Microservices
How to Monitor Microservices
 
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS'sMongoDB.local Austin 2018:  Pissing Off IT and Delivery: A Tale of 2 ODS's
MongoDB.local Austin 2018: Pissing Off IT and Delivery: A Tale of 2 ODS's
 
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’sMongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
MongoDB World 2018: Pissing Off IT and Delivery: A Tale of 2 ODS’s
 
2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice2016 - 10 questions you should answer before building a new microservice
2016 - 10 questions you should answer before building a new microservice
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 

Mining software vulns in SCCM / NIST's NVD

  • 1. Mining Software Vulns in SCCM / NIST’s NVD THE ROCKY ROAD TO DATA NIRVANA
  • 2. Overview • “Pleased to meet you” • The Playground • Challenge #1: Complex Data Structures • Challenge #2: “Dirty” unstructured data • Challenge #3: People issues • Lessons learned + Demo
  • 3. Who I am • Technical Security Architect at Ubisoft • Previous: 2 large financial institutions, a major retailer, a world-class telco, service bureaus • Generalist with a passion for all things “technical security”
  • 4. Disclaimer “Opinions expressed as well as the content of this presentation are the responsibility of the author. They do not represent Ubisoft company policy or views.”
  • 5. The Playground: “Find the panda” • 10K+ team members • 26 studios in 18 countries • Windows-centric • Creativity Rules! Where is the vulnerable non-Microsoft software installed?
  • 6.
  • 7. The Great Idea Microsoft’s SCCM: Reliable production software inventory NIST’s NVD database: Up-to-date vulnerability data Effective Patch Management
  • 8. The Great Idea: Why? • Avoids expen$ive licensing by using free public software • Vuln data can become a JSON feed into SIEM or DFIR “big data” mining app • Do the “impossible” with leading-edge technologies
  • 9. Challenge # 1 COMPLEX DATA STRUCTURES
  • 10. MS’ System Center Configuration Mgr • “The application people love to hate” • Indispensable for management of enterprise-scale Windows-centric environment • Back-end MS-Sql database: 1600+ tables, 6200+ views • Distributed component design leveragingWMI • On-premises deployment: complex architecture
  • 11. SCCM Components • 50+ components!!! • DLLs running (mostly) as threads, also some separate services • Communication: • In-core queues • Flat files stored in inboxes / outboxes
  • 12. SCCM and WMI SMS was the original WMI client “Everything” is architected using WMI: • Client-side • Internal control of agent operations • Discovery of hardware inventory • Server-side • SMS Provider isWMI provider • Exposes important database objects asWMI objects • ConfigMgr Console, SCCM auxiliary applications and tools are implemented asWMI Mgmt Applications.
  • 13. SCCM Discovery - I • Populates inventory data in SCCM database • 6 different methods • Which are enabled depends on site configuration • 4 methods target AD • 1 searches the surrounding network • 1 interacts with the SCCM client
  • 14. SCCM Discovery - II • AD Forest Discovery: IP subnets, AD sites • AD Group discovery: AD groups and memberships • AD User discovery: User accounts,AD attributes • AD System discovery: Computer discovery • Heartbeat discovery: • Enabled by default + must be enabled  Are clients healthy and reachable? • “creates discovery data records (DDRs) containing information about the client including network location, NetBIOS name, and operational status.” • Every 7 days by default. • Network discovery: Search domains, SNMP services, Dhcp servers. Disabled by default.
  • 15. SCCM Discovery - III “Garbage In – Garbage Out”
  • 16. SCCM Discovery - IV “Make friends with your SCCM administrator” • Methods enabled? • Polling interval?
  • 17. SCCM Data – “Getting to know you” “Hands-on” Exploring • MS Sql Studio Use AD to augment host inventory data • E.g. OU in Distinguished Name “Google isYour Friend” • Also SafariTechnical Library
  • 18. SCCM Data - I UseViews notTables • More stable interface • Better documentation • Permissions already in place • Performance – avoid locking tables • MS has done the “heavy lifting” e.g. joins, stored procedure definitions • More Community experience • This is what MS MVPs say to do Query SQL notWMI • More direct, simpler, better performance
  • 19. SCCM Data II – WMI Underpinnings • WMI Class Name: “SMS_xxx”  SQLView Name: v_xxx • WMI Property Names  Column names in SQLViews • View names > 30 chars are truncated • Column names have “0” appended to avoid conflicts with SQL reserved words
  • 20. SCCM Data III – View types • Inventory data: • Current: v_GS_< group name > • History: v_HS_< group name > • Discovery data: • WMI scalar properties: v_R_< resource type name > • WMI array properties: v_RA_< architecture name >_< group name >
  • 21. SCCM Data III – View types v_SchemaViews lists and categorizes ConfigMgr views
  • 22. SCCM Data IV – Inventory groups / views • v_GroupMap view lists inventory groups and views • Each one represents a WMI class configured for inventory collection in client agent settings DisplayName InvClassName InvHistoryClassName MIFClass System v_GS_System v_HS_System SYSTEM Add Remove Pgms v_GS_ADD_REMOVE_PROG RAMS v_HS_ADD_REMOVE_PROGR AMS MICROSOFT|ADD_REM OVE_PROGRAMS|1.0 Memory v_GS_X86_PC_MEMORY v_HS_X86_PC_MEMORY MICROSOFT|X86_PC_M EMORY|1.0
  • 23. SCCM Data V - Collections • A Collection is “a logical group of resources in ConfigMgr” • v_Collection view: Collection meta-data • “All…” columns – system-wide collections Name Members All Systems 25106 All Users 22903 All Unknown Computers 8 AllWindows Clients 20630 AllWindows Servers 3610
  • 24. SCCM Data VI – Which view to use? • v_R_System • FromAD / Network / Heartbeat Discovery • Resource_ID • NetBIOS name, OS, AD domain • 60+ fields • v_GS_System • Updated when Hardware Inventory runs • Less accurate – host must have active agent and be scheduled for hdware inventory • 10 fields
  • 25. SCCM Data : TL;DR In most production contexts, the relevant views are: • v_R_System • Host / user data • v_GS_ADD_REMOVE_PROGRAMS • v_GS_ADD_REMOVE_PROGRAMS_64 • Updated when Hardware Inventory runs • Installed software registry data
  • 26. NIST Data • Two main NIST data sets: • CPE:Vendor / product dictionary • CVE: List of vulnerabilities by year • Formalized, structured format (== XML)
  • 27. NIST’s CPE CPE == “Common Platform Enumeration” “Common Platform Enumeration (CPE) is a standardized method of describing and identifying classes of applications, operating systems, and hardware devices present among an enterprise's computing assets.”  A master list of all vendors and all their products.
  • 28. CPE Data - Header
  • 29. CPE Vendor / Product dictionary A typical item in the CPEVendor / Product dictionary:
  • 30. CPE Vendor / Product dictionary con’t
  • 31. CPE Data – Vendor / Product
  • 32. CPE Data – Vendor / Product
  • 33. CPE Data – Vendor / Product
  • 34. CPE Data – Vendor / Product
  • 35. CPE Data – Vendor / Product
  • 36. CPE Data – Vendor / Product
  • 37. CPE Data – Vendor / Product
  • 38. CPE Data – Vendor / Product
  • 39. CPE Data – Vendor / Product
  • 40. NIST’s NVD “The NationalVulnerability Database is the U.S. government repository of standards-based vulnerability management data …This data enables automation of vulnerability management, security measurement, and compliance.” (Wikipedia)
  • 41. NIST NVD Components A typical NIST NVD entry has the following components: Component Name Description CVE CommonVulnerabilities and Exposures The basic vulnerability listing includingCPE vendor / product. CVSS CommonVulnerability Scoring System Standardized vulnerability impact CWE Common Weakness Enumeration Augmented, standardized description of vulnerability
  • 42. CVE – Vulnerability A typical vulnerability entry in the NVD: CVE-2017-3547
  • 51. NIST NVD Feeds NVD CVE data available as a daily Feed: • XML or (new) JSON format • Compressed gzip or zip archive • Delta file or full download by year • Meta file with file sizes / SHA256 hash to determine if feed file has changed https://nvd.nist.gov/vuln/data-feeds
  • 52. NIST Data : TL;DR • CPE:Vendor / product dictionary • CVE: List of vulnerabilities by year • CVSS:Vuln Impact (contained in CVE) • XML standardized format • Daily feeds available
  • 53. Complex Data: The solution The challenge: • How to extract the unstructured vendor registry data from SCCM? • How to match this data with the NIST vulnerability data? The solution: • Wise choice ofTools • “Divide and conquer”
  • 54. Make Good Technology choices python: Good “data science” language • fuzzywuzzy: Fuzzy matching • xmltodict: XML parsing pandas: Data will fit in computer memory. Great python- based data analysis tool. scikit-learn: Reliable Artificial Intelligence / Machine Learning algorithms Docker: Move “skunkworks” project around as required ansible: Automate provisioning
  • 55. Basic Approach • Keep it native • UseWindows to talk toWindows (AD, SCCM) • Use Linux for Docker / python / pandas / scikit-learn • Keep it simple • 3rd-party software only, not Microsoft • “Divide and conquer” • Match vendors first • Then match products for a given vendor
  • 56. Basic Approach con’t Use Machine Learning • Treat this as two separate classification problems. • Manually label data (especially vendors) since data sets are small • Extract features from data using fuzzy matching
  • 57. Sample Vendor Data – Potential Matches SCCM CPE The GnuPG Project gnupg DigitalVolcano Software Ltd digitalvolcano NETGEAR Powerline netgear MIT Media Lab mit Cisco Systems, Inc. cisco DameWare Development, LLC. dameware BumpTechnologies, Inc. bump_project Open Source open_source_development_team
  • 58. Sample Vendor Data – SCCM Vendor names Will the real vendor please stand up? Cisco Oracle Cisco Consumer Products LLC Oracle Cisco Systems Oracle and/or its affiliates Cisco Systems, Inc Oracle Corporation Cisco Systems, Inc. Oracle Corporation. CiscoWebEx LLC Oracle USA Oracle, Inc.
  • 59. ML – Feature Extraction ML Classification Algorithm needs data “features” Basic approach: • Tokenization • Stop words • Fuzzy matching statistics • String length
  • 60. ML – Tokenization • Convert name string into a set of tokens: • Shift to lower case • Split string into tokens using separators: _ . , ( ) + ! • Remove “Stop” words • Tokens that appear often e.g. “Ltd.” “Inc.” “Project” “Software” • Add little “value” in determining whether there is a match
  • 61. ML – Fuzzy Matching I Levenshtein or “edit” distance: “The Levenshtein distance between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.” (Wikipedia)
  • 62. ML – Fuzzy Matching II python FuzzyWuzzy package https://github.com/seatgeek/fuzzywuzzy 1st string 2cd string Ratio Simple Ratio "this is a test" "this is a test!" 97 Partial Ratio "this is a test" "this is a test!" 100 Token Sort Ratio "fuzzy wuzzy was a bear" "wuzzy fuzzy was a bear" 100 Token Set Ratio "fuzzy was a bear" "fuzzy fuzzy was a bear" 100
  • 63. ML – Feature Extraction To extract data “features”: • Use the fuzzywuzzy pkg to calculate match ratios • Also use string length
  • 64. ML – Label the input data sets Observations: • Accurately matching vendor data is crucial • Data set size is small: ~10K vendors Approach: • Manually label data taking care to target important vendors • Use the manually labelled data to train the ML algorithm • Use ML-classified data + labelled data for final match processing!!
  • 65. ML – Algorithm Selection I Which algorithm to choose?
  • 66. ML – Algorithm Selection II Use simple K-Folds cross-validation • Split labelled data into k consecutive folds • Each fold is used once for validation while remaining k – 1 folds form the training set • Repeat for each algorithm being tested
  • 67. ML – Algorithm Selection III Random Forest Classifier was the best. • “Forest” of decision trees • Diverse set of classifiers built by introducing randomness in classifier construction • Prediction of the ensemble is the averaged prediction of the individual classifiers. http://scikit- learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
  • 68. ML – Algorithm Tuning “This algorithm has many parameters. How to tune for maximum accuracy?” Use Randomized Grid Search with Cross-Validation • Define initial parameter bounds / possible values • Randomized search over the parameter space • Use cross-validation to evaluate estimator accuracy
  • 69. ML – Software match sample results Just how good is the matching? CPE SCCM DisplayName0 cpe:/a:wireshark:wireshark:1.4.3 Wireshark 1.4.3 cpe:/a:videolan:vlc_media_player:1.1.6 VLC media player 1.1.6 cpe:/a:hp:headless_server_registry_update:1.0.0.0 Headless Server Registry Update cpe:/a:hp:insight_management_agents:8.70.0.0 HP Insight Management Agents cpe:/a:wireshark:wireshark:1.12.6 Wireshark 1.12.6 (64-bit) cpe:/a:adobe:indesign_cs4_common_base_files:6.0 Adobe InDesignCS4Application Feature Set Fil.. cpe:/a:hp:smart_web_printing:4.60 HP SmartWeb Printing 4.60 cpe:/a:mozilla:firefox:45.0.1 Mozilla Firefox 45.0.1 (x64 en-US) cpe:/a:watchguard:watchguard_system_manager:- WatchGuard System Manager 11.5.1
  • 70. Complex Data : TL;DR • Choose powerful technology: python / pandas / scikit-learn • Split into 2 separate simple classification problems • K-Folds Cross-validation picked Random Forest Classifier • Randomized Grid Search with Cross-validation to tune
  • 72. Then Everything Blew Up! Discovery: Real-life production data is full of anomalies! • AD • 80K extraneous hosts • SCCM • Did not manage “everything” • Some hosts were “missing in action” e.g. laptops • CPE • Vendor product naming / versioning varied wildly from vendor to vendor • Vendor buyouts / merges impacted product naming e.g. Java • Foreign language data / Unicode
  • 73. “Dirty” data solutions I • Spend hands-on time with the data • Manual labelling  several code rewrites • Use Defensive Coding • Validate all input • Use python “try” • Handle Missing data • The “bane” of pandas • Either discard or initialize to a known value
  • 74. “Dirty” Data solutions - II Discard extraneous data as quickly as possible, e.g.: • Microsoft software data • Deprecated NVD data • Unmanaged SCCM hosts • CVE listings for hardware / OS vulnerabilities
  • 75. “Dirty” Data Solutions - III Use heuristics to speed up matching • Vendor: • Ignore CPE vendors that are 1-2 characters long • 1st word of CPEVendor string has to be in the tokenizedWMI SCCM Publisher0 string somewhere • The condensed CPE name has to be shorter than the fullWMI "Publisher0“ • Products: • Release #’s should at least partially match • At least one word in the CPE product name should be found in the SCCM equivalent
  • 76. “Dirty” Data Solutions – IV When all else fails, develop code for the “problem” data e.g. Java product versioning
  • 77. “Dirty” Data Solutions: TL;DR • Get “intimate” with the data • “Shields up”: validate, “try” • “Shoot from the hip”: Kill the “missing” data before it gets you • “Take out the garbage” (data) • Cheat if you have to: Heuristics • “Plan B”: code around obstacles
  • 79. Present the idea to Ops to get support • Took my “great idea” to the SCCM Production Ops team • They were kind enough to meet with me. • On-site meeting with SCCM architect on conference call.
  • 80. Production Ops reaction: Oups! Disaster! • Talked “technology” instead of presenting from Ops viewpoint • SCCM architect • “The” key player • 6 time zones away, end of his day • Local meeting was not in his native language • The “man in the wall”
  • 81. Blessed by the King! (… Sort of) • VP came to town • Heard the prez • Wanted “his” dashboard: • For “yesterday” • Budget: $0 / 0 hr
  • 82. Ops reaction: We are Worried!!!! • Ops people rapidly became concerned about visibility ofVP Dashboard • Started making noises about “SCCM DB Performance” • Totally understandable reaction
  • 83. Ops Proposition: “Take our nice siding here” • Instead of direct production access, use a secondary non-prod DB employed for reporting / query • Turned out that this DB underwent arbitrary “black box” ETL of SCCM data depending on Ops reporting needs, visibility req’t!
  • 84. “People” Solutions I • Operate in “pirate” mode: Budget of 0 hr $0 means: • Run under the radar • Be focused and efficient – refactor prototype code into prod-ready batch classes • Be flexible, be creative: • Docker-based project bounced from Ubuntu toWindows to CentOS to save $ • Run on lab PCs, on scrapped PCs, on laptops, anything that is available • Make deals • “Sell your grandmother to the highest bidder” to get that precious direct production access
  • 85. “People” Solutions II • Deliver quietly, slowly, and “down-sell” to ease viz concerns • “Uh MrVP, your dashboard is not quite ready yet …” • “This is a new app and new technology. Data reliability is still to be proven …” • Provide targeted Ops training • Help the “dump truck” people understand the new-fangled “airplane” paradigm • Give Ops control and help them find ways to leverage the new technology
  • 86. Lessons learned “What I didn’t do but should have” • Data wrangling requires time and effort to do well • Set management and user expectations at the outset • Think “big”, think “production” to start • “Take baby steps”: Always runnable continuous development • Write test cases before writing code • Write code in small reusable modules with clean interfaces • Document and delegate
  • 87. “People” Solutions: TL;DR • Operate in “pirate” mode • Be flexible, be creative • When necessary: • “Sell your grandmother to the highest bidder” • Deliver quietly, slowly, and “down-sell” • Provide targeted Ops training • “Lessons learned”
  • 89. Contact Information Github and Docker Hub: lorgor/vulnmine Peerlyst: lorgor77 Loren Gordon Email: lgordon - - at - - lgsec.biz Twitter: --at-- lorgorsec Web: lorgor@blogspot.ca
  • 90. References System Center 2012 Configuration Manager Unleashed Meyler et al, 2012 Sams, ISBN-13: 978-0-672-33437-5 MSTN Sys Center 2012 Config Mgr SQLView Schema https://technet.microsoft.com/en-ca/library/dn581978.aspx MSTN Gallery: SCCM CfgMgr 1602 SQLViews Documentation https://gallery.technet.microsoft.com/SCCM-Configmgr-1602-SQL- 8db3b11c Shouts to Pixabay for their free images https://pixabay.com