SlideShare a Scribd company logo
1 of 25
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
VulnerableCode
Because a vulnerability database
should not be about Vulnerabilities
1
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
▷ Passionate about opensource, Linux and infosec
▷ Co-maintainer at VulnerableCode and univers
▷ Application Security engineer at CRED
▷ Google summer of code mentor and a participant
▷ Supporter of suckless philosophy
▷ hritik.elf@gmail.com, @MrHritik
Hritik Vijay
2
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Tushar Goel
▷ Co-maintainer at VulnerableCode, FetchCode, univers and
PackageURL
▷ Software Engineer at NexB
▷ Google summer of code mentor and a participant
▷ tgoel@nexb.com
3
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Agenda
▷ The state of vulnerability databases (open or not)
▷ How do we search? By package first!
▷ A better approach: package first
▷ Why VulnerableCode?
▷ VulnerableCode Solution
▷ How to create a package vulnerability database
○ Aggregate and correlate many data sources
○ Multi level data refinement
▷ Issues with vulnerability data
▷ Future plans
▷ Next steps: we need your help!
4
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
State of vulnerability databases (1)
▷ Databases with ghost packages
● DBs reference packages that do not exist anywhere
▷ Databases ghost vulnerable versions
● Even though these are not vulnerable or the opposite
▷ Databases "Crying wolf", improbable vulnerabilities
● DBs report a package as vulnerable if anything in the
dependency tree may be vulnerable (Log4j)
▷ Impossible, self-contradictory version ranges
● Resolved to nothing or everything
▷ Redundant and noisy duplicated vulnerabilities
▷ Vulnerabilities mapped to hard-to-find CPEs
5
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
State of vulnerability databases (2)
The Telephone Game problem
▷ Everyone is making something up a little by trying to improve data
▷ Each of them makes something up slightly differently
○ Too much reliance on automated tools on top of bad data
▷ Many DBs base their content on another DB content
○ At each step the data is transformed (and damaged) in subtle ways
▷ You can have as many vulnerable ranges as there are DB interpretations.
○ None of them is entirely faithful to the upstream data
■ Over time this turns into The Telephone game
▷ Upstream has better data
6
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
The true-true vulnerabilities are UPSTREAM!
Credit https://live.staticflickr.com/3798/10142017736_7f69d9f472_h.jpg
"Brown Bears at Brooks Camp, Katmai National Park"
by Christoph Strässlerhttps://www.flickr.com/photos/christoph_straessler/
This work is licensed under a Creative Commons Attribution-2.50 License.
https://creativecommons.org/licenses/by-sa/2.0/
https://www.youtube.com/watch?v=V4kWQSpCbvo
7
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
State of vulnerability databases (3)
▷ Databases of known FOSS software vulnerabilities are mostly
proprietary and/or privately maintained using proprietary tools,
processes and data.
▷ Why not open data? FOSS code likes open data about FOSS!
○ Some new entrants are now using open licenses:
○ GHSA (GitHub), OSV (Google), GitLab (one month delay)
▷ Emerging support for Package URL promotes interoperability
○ OSSF OSV, Sonatype OSSINDEX
▷ Emerging common format promotes interoperability
○ OSSF OSV
▷ But open formats does not mean common data identifiers
8
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
How do we search? By package first!
Questions to answer:
▷ Is package foo@1.0 known to be vulnerable?
○ What are the vulnerabilities?
○ What is the severity of the vulnerability?
○ Which version has a fix?
▷ More rarely: do I have any package vulnerable to this
vulnerability?
9
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
A better approach: package first
Lookup vulnerabilities, find packages
Find packages, lookup vulnerabilities
10
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Why VulnerableCode? accuracy and correctness
▷ Vulnerabilities are important!
▷ Code is more important! Package first!
▷ There is no Free Software Vulnerability Database that is
○ Open!!
○ Comprehensive, most ecosystems (system + package)
○ Curated by expert humans
○ Validated: Trusted but correlated and verified data
○ Working towards correctness
11
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
▷ Find packages with scanning, matching and tracing
○ Leverage all tools that report package-url (we support CPE too)
○ ScanCode.io and Toolkit, ORT, Tern, OWASP Dependency
Track and many more .... or an SBOM (SPDX, CycloneDX)
▷ Lookup package vulnerabilities in an open database that aggregates
them all
▷ Query by purl!
▷ Open data and open source tools are better.
▷ Eventually expert review and curate all the data.
VulnerableCode Solution
12
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
▷ Use data from upstream, at the source of the source!
○ From the package maintainers and authors themselves
▷ Employ a confidence based system: not all data are equally trusted
▷ Aggregate and correlate many data sources to enrich and validate
▷ Discover of new relations between vulnerabilities and packages from
mining the graph
▷ Eventually curate and review for correctness with experts
How to create a package vulnerability database?
13
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Aggregate and correlate many data sources
▷ Collect and parse many sources
○ Store in a common data model
○ Cross-reference to create a graph
▷ Project-specific trackers
○ Apache, OpenSSL, nginx...
○ Bug trackers, CHANGELOGs.
▷ Linux distro trackers (Debian, Ubuntu, RedHat, SUSE, Gentoo, ...)
○ Custom or standard formats (CVRF, OVAL)
▷ Application package trackers
○ NuGet, Rust, RubyGems, Pysec, RustSec, npm, ....
▷ NVD, and other aggregators: OSV, GHSA, GitLab, GSD and more.
14
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Multi level data refinement
▷ The data is always imported in an “Advisory” staging area
▷ "Advisory" data are converted to "Vulnerability" and "Package" data and
their relationships using "Improvers"
▷ "Advisory" data that cannot be converted are kept with a log to investigate
and resolve issue
▷ Specific improvers can mine the graph, cross check with other data sources,
resolve updated version ranges
Import Improve Filter
15
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Issue: Ghost packages
▷ Some packages do not exist anywhere
○ Including versions that may not exist
▷ Solution: Lookup upstream in the package registries and repositories
○ We can lookup in the registries and repositories to validate that the Package
URLs and versions are correct and really exist upstream
16
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Issue: Lesser data quality
▷ Some vulnerability sources cannot be trusted
○ Known to make incorrect or inaccurate assertions about packages and
versions
▷ Solution: store confidence level
○ Confidence level ensure we keep all inferred data, even lesser quality data
○ We do not trust others: We can discount the data source we trust less
○ And we do not trust ourselves: we can discount the automated inferences we
do if we are not 100% sure about their correctness
17
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Issue: Incorrect or missing versions
▷ Some package versions are missing or incorrect
○ Affected version statements are often ambiguous
○ "All the versions of package foo are vulnerable to CVE XZY" really means all
the versions of foo known at the time this advisory was published were
vulnerable.
▷ Solution: store version range, resolve range and "time travel"
○ We store version ranges as a compact string (using new purl "vers" spec)
○ We expand and resolve ranges with "univers" version handling library
○ Package version can be re/checked for being in a vulnerable range as needed
■ In the past and in the future
○ Improvers can do "time travel" based on version publication dates and determine
if a package version was vulnerable in the past when published.
18
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Issue: Duplicated data
▷ Some vulnerabilities are duplicates
○ Leads to many noisy relationships and lesser correlation abilities
▷ Solution: Introduce a new set of vulnerabilities id aliases
○ Before, we tracked by CVE id; only if there was none we created a
VULCOID id. (VulnerableCode ID for a vulnerability)
○ We now always use a VULCOID id and track many aliases (including a CVE
id when available) for each vulnerability
○ Aliases are used for data reconciliation during the second step of
"Improvers" meaning that we avoid a large number of duplicates
○ Improver jobs will further merge additional duplicates
19
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Other issues
▷ Many data sources - redundant, unstructured, messy,
incomplete
○ We grew to appreciate the complexity of the task and why commercial
vendors currently dominate the space
○ Solution: integrate them all (all the data sources) to cross-check them
▷ Old, obsolete, or less useful data
○ More is not always better - e.g. old vulnerabilities on Windows 95
○ Commercial-only software (Windows, etc.) or hardware is less interesting
○ Solution: let go of some of the past! and ignore the legacy
20
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Future plans
▷ More primary data sources, going upstream
▷ Actual commits fixing and introducing vulnerabilities
▷ YARA Rules: enable finer grain detection of actual vulnerable code
▷ Community peer curation system including curation UI
▷ AI/ML for data quality improvements
▷ .... and good ole heuristics
▷ VulnTotal: Tool to compare all the Vulnerabilities DB
○ Think Virustotal for vulnerabilities
21
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
If you want to learn more about our projects
▷Register for our upcoming webinar on July 21 at
https://nexb.com/0721-vulnerablecode
▷Read our latest blog post at
https://nexb.com/vulnerablecode-public-release
▷Download VulnerableCode at
https://github.com/nexB/vulnerablecode/releases/latest
▷Visit https://nexb.com/vulnerablecode for more information
22
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
If you want to help
You can contribute code, time, docs (or cash?)
▷ Use these fine FOSS tools and specs
● https://github.com/nexb/vulnerablecode
● https://www.aboutcode.org/
● https://github.com/nexB/
● https://github.com/package-url
▷ Join the conversation at
● https://gitter.im/aboutcode-org
▷ Donate at
● https://opencollective.com/aboutcode
23
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Credits
▷ Presentation template by SlidesCarnival licensed under CC-BY-4.0
▷ Photograph by Unsplash licensed under Unsplash License
▷ Other content licensed under CC-BY-4.0
24
© 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/
Summary
▷ Why Is There No Free Software Vulnerability Database?
▷ This was the provocative question we were asking two years ago when introducing the VulnerableCode
FOSS project. The situation has evolved positively since then -- in particular thanks to the creation of the
Open Source Security Foundation and the Open Source Vulnerability (OSV) project and schema.
▷ Yet the question is still relevant as there is still no comprehensive and open aggregated vulnerabilities
database that would cover most system and application package ecosystems. There are also continuous
looming concerns about the licensing of vulnerability feeds and how to best share and curate
vulnerability data.
▷ Hritik and Tushar will present the state of the open vulnerability databases and how new designs and
models that are not centered on vulnerabilities can help software and security professionals determine if
their FOSS software packages are subject to vulnerabilities more quickly, more efficiently and with less
noise
▷ They will also review some new and innovative techniques deployed in VulnerableCode to mine more
effectively open source vulnerabilities including "time travel", "log mining" or "range expansion" to
produce better focused vulnerability information.
25

More Related Content

Similar to A Vulnerability Database Should Not Be About Vulnerabilities!

Q Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - ConjurQ Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - Conjurconjur_inc
 
Managing Open Source software in the Docker era
Managing Open Source software in the Docker era Managing Open Source software in the Docker era
Managing Open Source software in the Docker era nexB Inc.
 
CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...
CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...
CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...sparkfabrik
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...sparkfabrik
 
Wrath of Ransomware_Longinus Timochenco
Wrath of Ransomware_Longinus TimochencoWrath of Ransomware_Longinus Timochenco
Wrath of Ransomware_Longinus TimochencoLonginus Timochenco
 
Open Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are usingOpen Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are usingAll Things Open
 
Container Days: Hijack a Kubernetes Cluster - a Walkthrough
Container Days: Hijack a Kubernetes Cluster - a WalkthroughContainer Days: Hijack a Kubernetes Cluster - a Walkthrough
Container Days: Hijack a Kubernetes Cluster - a WalkthroughNico Meisenzahl
 
Architecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things OpenArchitecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things OpenDaniel Barker
 
WebGoat.SDWAN.Net in Depth
WebGoat.SDWAN.Net in DepthWebGoat.SDWAN.Net in Depth
WebGoat.SDWAN.Net in Depthyalegko
 
WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment
WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment
WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment Sergey Gordeychik
 
Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024Clark Everetts
 
VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...
VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...
VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...YaJUG
 
Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationPrem Rao
 
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfRalf Gommers
 
ATT&CKING Containers in The Cloud
ATT&CKING Containers in The CloudATT&CKING Containers in The Cloud
ATT&CKING Containers in The CloudMITRE ATT&CK
 
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes ClusterKubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes Clustersmalltown
 
13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applicationsKarthik Gaekwad
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystemsparkfabrik
 
How is this sausage made
How is this sausage madeHow is this sausage made
How is this sausage madedejanb
 

Similar to A Vulnerability Database Should Not Be About Vulnerabilities! (20)

Q Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - ConjurQ Con New York 2015 Presentation - Conjur
Q Con New York 2015 Presentation - Conjur
 
Managing Open Source software in the Docker era
Managing Open Source software in the Docker era Managing Open Source software in the Docker era
Managing Open Source software in the Docker era
 
CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...
CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...
CodeMotion 2023 - Deep dive nella supply chain della nostra infrastruttura cl...
 
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
Drupal Dev Days Vienna 2023 - What is the secure software supply chain and th...
 
Wrath of Ransomware_Longinus Timochenco
Wrath of Ransomware_Longinus TimochencoWrath of Ransomware_Longinus Timochenco
Wrath of Ransomware_Longinus Timochenco
 
Open Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are usingOpen Source evaluation: A comprehensive guide on what you are using
Open Source evaluation: A comprehensive guide on what you are using
 
Container Days: Hijack a Kubernetes Cluster - a Walkthrough
Container Days: Hijack a Kubernetes Cluster - a WalkthroughContainer Days: Hijack a Kubernetes Cluster - a Walkthrough
Container Days: Hijack a Kubernetes Cluster - a Walkthrough
 
Architecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things OpenArchitecting the Future: Abstractions and Metadata - All Things Open
Architecting the Future: Abstractions and Metadata - All Things Open
 
WebGoat.SDWAN.Net in Depth
WebGoat.SDWAN.Net in DepthWebGoat.SDWAN.Net in Depth
WebGoat.SDWAN.Net in Depth
 
WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment
WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment
WebGoat.SDWAN.Net in Depth: SD-WAN Security Assessment
 
Node.js Module: I Choose You!
Node.js Module: I Choose You!Node.js Module: I Choose You!
Node.js Module: I Choose You!
 
Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024Analysis of-quality-of-pkgs-in-packagist-univ-20171024
Analysis of-quality-of-pkgs-in-packagist-univ-20171024
 
VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...
VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...
VoxxedDays Luxembourg - Abuse web browsers for fun & profits - Dominique Righ...
 
Hybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized DeduplicationHybrid Cloud Approach for Secure Authorized Deduplication
Hybrid Cloud Approach for Secure Authorized Deduplication
 
Reliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdfReliable from-source builds (Qshare 28 Nov 2023).pdf
Reliable from-source builds (Qshare 28 Nov 2023).pdf
 
ATT&CKING Containers in The Cloud
ATT&CKING Containers in The CloudATT&CKING Containers in The Cloud
ATT&CKING Containers in The Cloud
 
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes ClusterKubernetes Summit 2019 - Harden Your Kubernetes Cluster
Kubernetes Summit 2019 - Harden Your Kubernetes Cluster
 
13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications13 practical tips for writing secure golang applications
13 practical tips for writing secure golang applications
 
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP EcosystemWhat is the Secure Supply Chain and the Current State of the PHP Ecosystem
What is the Secure Supply Chain and the Current State of the PHP Ecosystem
 
How is this sausage made
How is this sausage madeHow is this sausage made
How is this sausage made
 

Recently uploaded

W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfCionsystems
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....ShaimaaMohamedGalal
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 

Recently uploaded (20)

Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Active Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdfActive Directory Penetration Testing, cionsystems.com.pdf
Active Directory Penetration Testing, cionsystems.com.pdf
 
Clustering techniques data mining book ....
Clustering techniques data mining book ....Clustering techniques data mining book ....
Clustering techniques data mining book ....
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

A Vulnerability Database Should Not Be About Vulnerabilities!

  • 1. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ VulnerableCode Because a vulnerability database should not be about Vulnerabilities 1
  • 2. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ ▷ Passionate about opensource, Linux and infosec ▷ Co-maintainer at VulnerableCode and univers ▷ Application Security engineer at CRED ▷ Google summer of code mentor and a participant ▷ Supporter of suckless philosophy ▷ hritik.elf@gmail.com, @MrHritik Hritik Vijay 2
  • 3. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Tushar Goel ▷ Co-maintainer at VulnerableCode, FetchCode, univers and PackageURL ▷ Software Engineer at NexB ▷ Google summer of code mentor and a participant ▷ tgoel@nexb.com 3
  • 4. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Agenda ▷ The state of vulnerability databases (open or not) ▷ How do we search? By package first! ▷ A better approach: package first ▷ Why VulnerableCode? ▷ VulnerableCode Solution ▷ How to create a package vulnerability database ○ Aggregate and correlate many data sources ○ Multi level data refinement ▷ Issues with vulnerability data ▷ Future plans ▷ Next steps: we need your help! 4
  • 5. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ State of vulnerability databases (1) ▷ Databases with ghost packages ● DBs reference packages that do not exist anywhere ▷ Databases ghost vulnerable versions ● Even though these are not vulnerable or the opposite ▷ Databases "Crying wolf", improbable vulnerabilities ● DBs report a package as vulnerable if anything in the dependency tree may be vulnerable (Log4j) ▷ Impossible, self-contradictory version ranges ● Resolved to nothing or everything ▷ Redundant and noisy duplicated vulnerabilities ▷ Vulnerabilities mapped to hard-to-find CPEs 5
  • 6. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ State of vulnerability databases (2) The Telephone Game problem ▷ Everyone is making something up a little by trying to improve data ▷ Each of them makes something up slightly differently ○ Too much reliance on automated tools on top of bad data ▷ Many DBs base their content on another DB content ○ At each step the data is transformed (and damaged) in subtle ways ▷ You can have as many vulnerable ranges as there are DB interpretations. ○ None of them is entirely faithful to the upstream data ■ Over time this turns into The Telephone game ▷ Upstream has better data 6
  • 7. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ The true-true vulnerabilities are UPSTREAM! Credit https://live.staticflickr.com/3798/10142017736_7f69d9f472_h.jpg "Brown Bears at Brooks Camp, Katmai National Park" by Christoph Strässlerhttps://www.flickr.com/photos/christoph_straessler/ This work is licensed under a Creative Commons Attribution-2.50 License. https://creativecommons.org/licenses/by-sa/2.0/ https://www.youtube.com/watch?v=V4kWQSpCbvo 7
  • 8. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ State of vulnerability databases (3) ▷ Databases of known FOSS software vulnerabilities are mostly proprietary and/or privately maintained using proprietary tools, processes and data. ▷ Why not open data? FOSS code likes open data about FOSS! ○ Some new entrants are now using open licenses: ○ GHSA (GitHub), OSV (Google), GitLab (one month delay) ▷ Emerging support for Package URL promotes interoperability ○ OSSF OSV, Sonatype OSSINDEX ▷ Emerging common format promotes interoperability ○ OSSF OSV ▷ But open formats does not mean common data identifiers 8
  • 9. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ How do we search? By package first! Questions to answer: ▷ Is package foo@1.0 known to be vulnerable? ○ What are the vulnerabilities? ○ What is the severity of the vulnerability? ○ Which version has a fix? ▷ More rarely: do I have any package vulnerable to this vulnerability? 9
  • 10. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ A better approach: package first Lookup vulnerabilities, find packages Find packages, lookup vulnerabilities 10
  • 11. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Why VulnerableCode? accuracy and correctness ▷ Vulnerabilities are important! ▷ Code is more important! Package first! ▷ There is no Free Software Vulnerability Database that is ○ Open!! ○ Comprehensive, most ecosystems (system + package) ○ Curated by expert humans ○ Validated: Trusted but correlated and verified data ○ Working towards correctness 11
  • 12. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ ▷ Find packages with scanning, matching and tracing ○ Leverage all tools that report package-url (we support CPE too) ○ ScanCode.io and Toolkit, ORT, Tern, OWASP Dependency Track and many more .... or an SBOM (SPDX, CycloneDX) ▷ Lookup package vulnerabilities in an open database that aggregates them all ▷ Query by purl! ▷ Open data and open source tools are better. ▷ Eventually expert review and curate all the data. VulnerableCode Solution 12
  • 13. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ ▷ Use data from upstream, at the source of the source! ○ From the package maintainers and authors themselves ▷ Employ a confidence based system: not all data are equally trusted ▷ Aggregate and correlate many data sources to enrich and validate ▷ Discover of new relations between vulnerabilities and packages from mining the graph ▷ Eventually curate and review for correctness with experts How to create a package vulnerability database? 13
  • 14. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Aggregate and correlate many data sources ▷ Collect and parse many sources ○ Store in a common data model ○ Cross-reference to create a graph ▷ Project-specific trackers ○ Apache, OpenSSL, nginx... ○ Bug trackers, CHANGELOGs. ▷ Linux distro trackers (Debian, Ubuntu, RedHat, SUSE, Gentoo, ...) ○ Custom or standard formats (CVRF, OVAL) ▷ Application package trackers ○ NuGet, Rust, RubyGems, Pysec, RustSec, npm, .... ▷ NVD, and other aggregators: OSV, GHSA, GitLab, GSD and more. 14
  • 15. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Multi level data refinement ▷ The data is always imported in an “Advisory” staging area ▷ "Advisory" data are converted to "Vulnerability" and "Package" data and their relationships using "Improvers" ▷ "Advisory" data that cannot be converted are kept with a log to investigate and resolve issue ▷ Specific improvers can mine the graph, cross check with other data sources, resolve updated version ranges Import Improve Filter 15
  • 16. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Issue: Ghost packages ▷ Some packages do not exist anywhere ○ Including versions that may not exist ▷ Solution: Lookup upstream in the package registries and repositories ○ We can lookup in the registries and repositories to validate that the Package URLs and versions are correct and really exist upstream 16
  • 17. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Issue: Lesser data quality ▷ Some vulnerability sources cannot be trusted ○ Known to make incorrect or inaccurate assertions about packages and versions ▷ Solution: store confidence level ○ Confidence level ensure we keep all inferred data, even lesser quality data ○ We do not trust others: We can discount the data source we trust less ○ And we do not trust ourselves: we can discount the automated inferences we do if we are not 100% sure about their correctness 17
  • 18. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Issue: Incorrect or missing versions ▷ Some package versions are missing or incorrect ○ Affected version statements are often ambiguous ○ "All the versions of package foo are vulnerable to CVE XZY" really means all the versions of foo known at the time this advisory was published were vulnerable. ▷ Solution: store version range, resolve range and "time travel" ○ We store version ranges as a compact string (using new purl "vers" spec) ○ We expand and resolve ranges with "univers" version handling library ○ Package version can be re/checked for being in a vulnerable range as needed ■ In the past and in the future ○ Improvers can do "time travel" based on version publication dates and determine if a package version was vulnerable in the past when published. 18
  • 19. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Issue: Duplicated data ▷ Some vulnerabilities are duplicates ○ Leads to many noisy relationships and lesser correlation abilities ▷ Solution: Introduce a new set of vulnerabilities id aliases ○ Before, we tracked by CVE id; only if there was none we created a VULCOID id. (VulnerableCode ID for a vulnerability) ○ We now always use a VULCOID id and track many aliases (including a CVE id when available) for each vulnerability ○ Aliases are used for data reconciliation during the second step of "Improvers" meaning that we avoid a large number of duplicates ○ Improver jobs will further merge additional duplicates 19
  • 20. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Other issues ▷ Many data sources - redundant, unstructured, messy, incomplete ○ We grew to appreciate the complexity of the task and why commercial vendors currently dominate the space ○ Solution: integrate them all (all the data sources) to cross-check them ▷ Old, obsolete, or less useful data ○ More is not always better - e.g. old vulnerabilities on Windows 95 ○ Commercial-only software (Windows, etc.) or hardware is less interesting ○ Solution: let go of some of the past! and ignore the legacy 20
  • 21. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Future plans ▷ More primary data sources, going upstream ▷ Actual commits fixing and introducing vulnerabilities ▷ YARA Rules: enable finer grain detection of actual vulnerable code ▷ Community peer curation system including curation UI ▷ AI/ML for data quality improvements ▷ .... and good ole heuristics ▷ VulnTotal: Tool to compare all the Vulnerabilities DB ○ Think Virustotal for vulnerabilities 21
  • 22. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ If you want to learn more about our projects ▷Register for our upcoming webinar on July 21 at https://nexb.com/0721-vulnerablecode ▷Read our latest blog post at https://nexb.com/vulnerablecode-public-release ▷Download VulnerableCode at https://github.com/nexB/vulnerablecode/releases/latest ▷Visit https://nexb.com/vulnerablecode for more information 22
  • 23. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ If you want to help You can contribute code, time, docs (or cash?) ▷ Use these fine FOSS tools and specs ● https://github.com/nexb/vulnerablecode ● https://www.aboutcode.org/ ● https://github.com/nexB/ ● https://github.com/package-url ▷ Join the conversation at ● https://gitter.im/aboutcode-org ▷ Donate at ● https://opencollective.com/aboutcode 23
  • 24. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Credits ▷ Presentation template by SlidesCarnival licensed under CC-BY-4.0 ▷ Photograph by Unsplash licensed under Unsplash License ▷ Other content licensed under CC-BY-4.0 24
  • 25. © 2022 nexB Inc. License: CC-BY-SA-4.0 https://www.nexb.com/ https://www.aboutcode.org/ Summary ▷ Why Is There No Free Software Vulnerability Database? ▷ This was the provocative question we were asking two years ago when introducing the VulnerableCode FOSS project. The situation has evolved positively since then -- in particular thanks to the creation of the Open Source Security Foundation and the Open Source Vulnerability (OSV) project and schema. ▷ Yet the question is still relevant as there is still no comprehensive and open aggregated vulnerabilities database that would cover most system and application package ecosystems. There are also continuous looming concerns about the licensing of vulnerability feeds and how to best share and curate vulnerability data. ▷ Hritik and Tushar will present the state of the open vulnerability databases and how new designs and models that are not centered on vulnerabilities can help software and security professionals determine if their FOSS software packages are subject to vulnerabilities more quickly, more efficiently and with less noise ▷ They will also review some new and innovative techniques deployed in VulnerableCode to mine more effectively open source vulnerabilities including "time travel", "log mining" or "range expansion" to produce better focused vulnerability information. 25