Mais conteúdo relacionado Semelhante a 014 Graph Database Techniques to Reduce Risk and Innovate - NODES2022 AMERICAS Beginner 6 - Michael Bandor.pdf (20) 014 Graph Database Techniques to Reduce Risk and Innovate - NODES2022 AMERICAS Beginner 6 - Michael Bandor.pdf1. 1
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Software Engineering Institute
Carnegie Mellon University
Pittsburgh, PA 15213
Graph Database Techniques to
Reduce Risk and Innovate
NODES 2022 Conference
Michael Bandor
2. 2
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Document Markings
Copyright 2022 Carnegie Mellon University.
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon
University for the operation of the Software Engineering Institute, a federally funded research and development center.
References herein to any specific commercial product, process, or service by trade name, trade mark, manufacturer, or otherwise, does not
necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED
ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR
IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR
MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY
DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT
INFRINGEMENT.
[DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for
non-US Government use and distribution.
This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal
permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at
permission@sei.cmu.edu.
DM22-1020
3. 3
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Presentation Topics
Problem Space
Modeling the Problem
Prototype Implementation
Future Considerations - Relationship to a Software Bill of Materials
(SBOM)
Questions
4. 4
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Problem Space
5. 5
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
A US Department of Defense (DoD) customer was struggling to identify the risk
associated with software obsolescence in their system
Contractor was tracking software information across various databases and providing a
spreadsheet every month to the DoD customer
• Tendency for organizations tend to use tools such as spreadsheets to track information
since they are readily available
• A spreadsheet does not provide additional insight into (1) the relationships in the data
and (2) any ripple effects. These relationships and ripple effects directly affect any risks
the program may be tracking, while others may not be known because of secondary
and tertiary relationships in the data.
A prototype implementation using Neo4j was created to ingest the contractor provided
data to create a knowledge graph to better understand the relationships in the data and
manage risk.
Background
6. 6
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Problem Space
Problem: Tracking COTS, GOTS and FOSS obsolescence issues and vulnerabilities in
software intensive systems
Questions that need to be asked:
• What is affected?
• When is the product no longer supported?
• What is the immediate impact?
• What are the secondary and tertiary dependencies?
• Where are the products located/used in the system and environments?
• What are the impacts of identified vulnerabilities (CVEs)?
• How do we track this?
7. 7
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Problem Space
Organizations tend to use whatever is available (typically a spreadsheet) to identify and
track the issues
Tracking relationships and impacts gets to be a very complicated situation at best
Visualization of the impacts is extremely difficult if even possible
A Design Structure Matrix (DSM)1 approach would show clusters but not secondary and
tertiary dependencies.
1 This is sometimes also referred to as dependency structure matrix, dependency structure
method, dependency source matrix, etc. “ https://dsmweb.org/ ,
https://en.wikipedia.org/wiki/Design_structure_matrix
8. 8
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Modeling the Problem
9. 9
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation
Why a graph representation?
• Everything is naturally connected, networks of people, transactions, supply chains
• “Graphs form the foundation of modern data and analytics techniques, with capabilities
to enhance and improve user collaboration, Machine Learning models, and explainable
Artificial Intelligence.” – Gartner, “Top 10 Tech Trends in Data and Analytics,” 16 Feb
20211
A property graph lets the problem be represented through Nodes and Relationships of the
nodes to each other
1 https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021
10. 10
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation
Nodes* are the entities in the graph.
• They represent objects (nouns)
• They can hold any number of attributes (key-value pairs) called properties.
• Nodes can be tagged with labels, representing their different roles in your domain.
• Node labels may also serve to attach metadata (such as index or constraint
information) to certain nodes.
* The Property Graph Model, https://neo4j.com/developer/graph-database/
11. 11
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation
Relationships* provide directed, named, semantically relevant connections between two
node entities (e.g., Employee WORKS_FOR Company).
• Relationships connect nodes and represent actions (verbs)
• A relationship always has a direction, a type, a start node, and an end node.
• Like nodes, relationships can also have properties. In most cases, relationships have
quantitative properties, such as weights, costs, distances, ratings, time intervals, or
strengths.
• Due to the efficient way relationships are stored, two nodes can share any number or
type of relationships without sacrificing performance.
• Although they are stored in a specific direction, relationships can always be navigated
efficiently in either direction.
* The Property Graph Model, https://neo4j.com/developer/graph-database/
12. 12
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation Example
13. 13
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
Using a product centric approach, what information needs to be tracked?
A product has:
• Name & Version
• Manufacturer (Company)
• End of Support Date (EOS)
• End of Extended Support Date (EEOS)
• End of Life Date (EOL)
• Category (COTS/GOTS/FOSS)
• Possible dependency on another software product
• Runs on an operating system
• May have been through a formal obsolescence evaluation process
• May have one or more vulnerabilities
14. 14
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
An operating system has:
• Name & Version
• Manufacturer
• End of Support Date (EOS)
• End of Extended Support Date (EEOS)
• End of Life Date (EOL)
• May have been through a formal obsolescence evaluation process
• May have one or more vulnerabilities
Operating Systems tend to have a different impact on obsolescence issues and
vulnerabilities than other software products
Recommend tracking as separate entity either as a node or additional node label
15. 15
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Product
Defining the nodes (Product, OS, etc.) and the relationships:
A Product:
• Runs on an OS (RUNS_ON relationship)
• May depend on another product (DEPENDS_ON relationship where it exists)
• Is created by a company (represented as a node; CREATED_BY relationship)
• May have an obsolescence issue (represented as a node; HAS relationship where an obsolescence
issue has been identified and is being tracked externally, i.e., Jiratm ticket)
• May contain one or more vulnerabilities (represented as nodes; CONTAINS relationship where it
exists)
• Is installed in one or more environments (represented as nodes; INSTALLED_IN relationship)
Jira is a trademark of Atlassian
16. 16
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
An OS:
• Is created by a company (represented as a node; CREATED_BY relationship)
• May have an obsolescence issue (represented as a node; HAS relationship where an obsolescence
issue has been identified and is being tracked externally, i.e., Jiratm ticket)
• May contain one or more vulnerabilities (represented as nodes; CONTAINS relationship where it
exists)
• Is installed in one or more environments (represented as nodes; INSTALLED_IN relationship)
Obsolescence issue is intentionally modeled as a separate node
• Contains a link to a Jira ticket
• There may be more than one product covered under a ticket
• Also contains risk information
17. 17
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
INSTALLED_IN
CONTAINS
CONTAINS
ENVIRONMENT PRODUCT
OS
VULNERABILITY
Company
Name:
OBS Issue
18. 18
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Prototype Implementation
19. 19
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Possible Use Case Scenarios
1. A manufacturer announces its software product will no longer be supported next year.
What are the impacts to the system?
2. A vulnerability has been identified in a software product. Where is it installed in the
system in order to determine the risk?
3. A company sells a product line to a foreign company. Where is this product in the
system and what is the impact?
4. A new environment is going to be established to support product testing. What
products need to be on that environment?
5. A newer version of a product is now available. Is there anything keeping the upgrade
from happening (dependencies)?
20. 20
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation
21. 21
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation
Goals of the implementation:
• Use the existing data source (spreadsheet) from the contractor
• Government customer would ingest the spreadsheet each month
• Government customer would perform the analysis
• Provide installation instructions, model details and documentation, use case scenarios
with examples and Cypher scripts, primary script for initial data ingest and updates
• Make it as easy to use as possible
22. 22
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation
Implementation Steps:
1. Data analysis and understanding of how the data was being used
2. Develop and refine the model
3. Conduct small, test sample imports to get the logic correct
4. Based on prior tests, build the entire ingest script to ingest the spreadsheet data using APOC
5. Determine how to handle data entry errors (e.g., empty fields that are used as indexes, typographical errors in some
data elements, date representation, etc.)
6. Confirm the import and subsequent graph was correct – refine import script and model were needed
7. Implement revisions to the import script to handle subsequent updates (e.g., flagging new nodes and updated node
data)
8. Perform analyses using the use case scenarios to prove the correctness of the model and assumptions
9. Tune the script and graph model for better performance
10. Create support documentation and provide all deliverables to the customer
23. 23
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation - Results
End results:
3 different month’s worth of data ingested
1554 nodes created/updated
• 134 Companies
• 1074 Products
• 164 Jira tickets
• 12 Software Categories
• 166 End of Support date clusters
24. 24
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation – Analysis Examples
Product End of Support Clusters Jira Ticket Clusters
Previously unrealized/unseen clusters of data!
25. 25
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Implementation Prototype - Challenges
• Data source (spreadsheet) was created from 3 separate databases (multiple data elements in each
row)
• Getting the “magic decoder” to understand what each column actually meant and was used for (only a
small fraction of the columns were actually used for the prototype)
• Determining the best way to parse the spreadsheet (single path with refinements vs. multiple passes)
• A single pass method was used and then subsequent passes through the nodes to further refine
the graph (including cleanup of temporary properties in the Product node)
• Use of an End_of_Service node for performance and analysis vs. parsing the EndofService date in
each node
• Refactoring the initial ingest script to handle updates rather than using 2 separate scripts
• Use of a dateCreated and dateUpdated property when performing the MERGE operation with the
nodes (e.g.. Set the dateCreated with the ON CREATE option and set the dateUpdated with the
ON MERGE option)
• Documenting the model, use cases (with examples) and installation instructions for novice users
26. 26
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Future Considerations -
Relationship to a Software Bill of
Materials (SBOM)
27. 27
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Future Considerations - Relationship to a Software Bill of
Materials (SBOM)
What is an SBOM?
An SBOM is a formal record containing the details and supply chain relationships of various
components used in building software. In addition to establishing these minimum elements, this report
defines the scope of how to think about minimum elements, describes SBOM use cases for greater
transparency in the software supply chain, and lays out options for future evolution.1
Relationships to Graphs:
Depth. An SBOM should contain all primary (top level) components, with all their transitive
dependencies listed. At a minimum, all top-level dependencies must be listed with enough detail to
seek out the transitive dependencies recursively.
Going further into the graph will provide more information. As organizations begin SBOM, depth
beyond the primary components may not be easily available due to existing requirements with
subcomponent suppliers. Eventual adoption of SBOM processes will enable access to additional
depth through deeper levels of transparency at the subcomponent level. It should be noted that some
use cases require complete or mostly complete graphs, such as the ability to “prove the negative” that
a given component is not on an organization’s network.1
1 The Minimum Essential Elements of a Software Bill of Materials, United States Department of Commerce,
July 12, 2021, https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
28. 28
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Future Considerations - Relationship to a Software Bill of
Materials (SBOM)
Relationships to Graphs (cont):
Known Unknowns. For instances in which the full dependency graph is not enumerated in the
SBOM, the SBOM author must explicitly identify “known unknowns.” That is, the dependency data
draws a clear distinction between a component that has no further dependencies, and a component
for which the presence of dependencies is unknown and incomplete...
Other Component Relationships. The minimum elements of SBOM are connected through a single
type of relationship: dependency. That is, X is included in Y. This relationship is implied in the SBOM
graph structure…1
There is a very noticeable correlation between the “minimum” elements in an SBOM and
that used in the software obsolescence graph!
• Candidate for future research and implementation
1 The Minimum Essential Elements of a Software Bill of Materials, United States Department of Commerce,
July 12, 2021, https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
29. 29
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Questions
30. 30
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Contact Information
Michael S. Bandor
Senior Software Engineer
Software Engineering Institute (SEI)
Carnegie Mellon University (CMU)
mbandor@sei.cmu.edu
412-268-8423