SlideShare uma empresa Scribd logo
1 de 30
Baixar para ler offline
1
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Software Engineering Institute
Carnegie Mellon University
Pittsburgh, PA 15213
Graph Database Techniques to
Reduce Risk and Innovate
NODES 2022 Conference
Michael Bandor
2
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Document Markings
Copyright 2022 Carnegie Mellon University.
This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon
University for the operation of the Software Engineering Institute, a federally funded research and development center.
References herein to any specific commercial product, process, or service by trade name, trade mark, manufacturer, or otherwise, does not
necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute.
NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED
ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR
IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR
MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY
DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT
INFRINGEMENT.
[DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for
non-US Government use and distribution.
This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal
permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at
permission@sei.cmu.edu.
DM22-1020
3
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Presentation Topics
Problem Space
Modeling the Problem
Prototype Implementation
Future Considerations - Relationship to a Software Bill of Materials
(SBOM)
Questions
4
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Problem Space
5
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
A US Department of Defense (DoD) customer was struggling to identify the risk
associated with software obsolescence in their system
Contractor was tracking software information across various databases and providing a
spreadsheet every month to the DoD customer
• Tendency for organizations tend to use tools such as spreadsheets to track information
since they are readily available
• A spreadsheet does not provide additional insight into (1) the relationships in the data
and (2) any ripple effects. These relationships and ripple effects directly affect any risks
the program may be tracking, while others may not be known because of secondary
and tertiary relationships in the data.
A prototype implementation using Neo4j was created to ingest the contractor provided
data to create a knowledge graph to better understand the relationships in the data and
manage risk.
Background
6
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Problem Space
Problem: Tracking COTS, GOTS and FOSS obsolescence issues and vulnerabilities in
software intensive systems
Questions that need to be asked:
• What is affected?
• When is the product no longer supported?
• What is the immediate impact?
• What are the secondary and tertiary dependencies?
• Where are the products located/used in the system and environments?
• What are the impacts of identified vulnerabilities (CVEs)?
• How do we track this?
7
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Problem Space
Organizations tend to use whatever is available (typically a spreadsheet) to identify and
track the issues
Tracking relationships and impacts gets to be a very complicated situation at best
Visualization of the impacts is extremely difficult if even possible
A Design Structure Matrix (DSM)1 approach would show clusters but not secondary and
tertiary dependencies.
1 This is sometimes also referred to as dependency structure matrix, dependency structure
method, dependency source matrix, etc. “ https://dsmweb.org/ ,
https://en.wikipedia.org/wiki/Design_structure_matrix
8
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Modeling the Problem
9
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation
Why a graph representation?
• Everything is naturally connected, networks of people, transactions, supply chains
• “Graphs form the foundation of modern data and analytics techniques, with capabilities
to enhance and improve user collaboration, Machine Learning models, and explainable
Artificial Intelligence.” – Gartner, “Top 10 Tech Trends in Data and Analytics,” 16 Feb
20211
A property graph lets the problem be represented through Nodes and Relationships of the
nodes to each other
1 https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021
10
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation
Nodes* are the entities in the graph.
• They represent objects (nouns)
• They can hold any number of attributes (key-value pairs) called properties.
• Nodes can be tagged with labels, representing their different roles in your domain.
• Node labels may also serve to attach metadata (such as index or constraint
information) to certain nodes.
* The Property Graph Model, https://neo4j.com/developer/graph-database/
11
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation
Relationships* provide directed, named, semantically relevant connections between two
node entities (e.g., Employee WORKS_FOR Company).
• Relationships connect nodes and represent actions (verbs)
• A relationship always has a direction, a type, a start node, and an end node.
• Like nodes, relationships can also have properties. In most cases, relationships have
quantitative properties, such as weights, costs, distances, ratings, time intervals, or
strengths.
• Due to the efficient way relationships are stored, two nodes can share any number or
type of relationships without sacrificing performance.
• Although they are stored in a specific direction, relationships can always be navigated
efficiently in either direction.
* The Property Graph Model, https://neo4j.com/developer/graph-database/
12
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Property Graph Representation Example
13
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
Using a product centric approach, what information needs to be tracked?
A product has:
• Name & Version
• Manufacturer (Company)
• End of Support Date (EOS)
• End of Extended Support Date (EEOS)
• End of Life Date (EOL)
• Category (COTS/GOTS/FOSS)
• Possible dependency on another software product
• Runs on an operating system
• May have been through a formal obsolescence evaluation process
• May have one or more vulnerabilities
14
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
An operating system has:
• Name & Version
• Manufacturer
• End of Support Date (EOS)
• End of Extended Support Date (EEOS)
• End of Life Date (EOL)
• May have been through a formal obsolescence evaluation process
• May have one or more vulnerabilities
Operating Systems tend to have a different impact on obsolescence issues and
vulnerabilities than other software products
Recommend tracking as separate entity either as a node or additional node label
15
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Product
Defining the nodes (Product, OS, etc.) and the relationships:
A Product:
• Runs on an OS (RUNS_ON relationship)
• May depend on another product (DEPENDS_ON relationship where it exists)
• Is created by a company (represented as a node; CREATED_BY relationship)
• May have an obsolescence issue (represented as a node; HAS relationship where an obsolescence
issue has been identified and is being tracked externally, i.e., Jiratm ticket)
• May contain one or more vulnerabilities (represented as nodes; CONTAINS relationship where it
exists)
• Is installed in one or more environments (represented as nodes; INSTALLED_IN relationship)
Jira is a trademark of Atlassian
16
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
An OS:
• Is created by a company (represented as a node; CREATED_BY relationship)
• May have an obsolescence issue (represented as a node; HAS relationship where an obsolescence
issue has been identified and is being tracked externally, i.e., Jiratm ticket)
• May contain one or more vulnerabilities (represented as nodes; CONTAINS relationship where it
exists)
• Is installed in one or more environments (represented as nodes; INSTALLED_IN relationship)
Obsolescence issue is intentionally modeled as a separate node
• Contains a link to a Jira ticket
• There may be more than one product covered under a ticket
• Also contains risk information
17
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Modeling the Problem
INSTALLED_IN
CONTAINS
CONTAINS
ENVIRONMENT PRODUCT
OS
VULNERABILITY
Company
Name:
OBS Issue
18
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Prototype Implementation
19
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Possible Use Case Scenarios
1. A manufacturer announces its software product will no longer be supported next year.
What are the impacts to the system?
2. A vulnerability has been identified in a software product. Where is it installed in the
system in order to determine the risk?
3. A company sells a product line to a foreign company. Where is this product in the
system and what is the impact?
4. A new environment is going to be established to support product testing. What
products need to be on that environment?
5. A newer version of a product is now available. Is there anything keeping the upgrade
from happening (dependencies)?
20
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation
21
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation
Goals of the implementation:
• Use the existing data source (spreadsheet) from the contractor
• Government customer would ingest the spreadsheet each month
• Government customer would perform the analysis
• Provide installation instructions, model details and documentation, use case scenarios
with examples and Cypher scripts, primary script for initial data ingest and updates
• Make it as easy to use as possible
22
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation
Implementation Steps:
1. Data analysis and understanding of how the data was being used
2. Develop and refine the model
3. Conduct small, test sample imports to get the logic correct
4. Based on prior tests, build the entire ingest script to ingest the spreadsheet data using APOC
5. Determine how to handle data entry errors (e.g., empty fields that are used as indexes, typographical errors in some
data elements, date representation, etc.)
6. Confirm the import and subsequent graph was correct – refine import script and model were needed
7. Implement revisions to the import script to handle subsequent updates (e.g., flagging new nodes and updated node
data)
8. Perform analyses using the use case scenarios to prove the correctness of the model and assumptions
9. Tune the script and graph model for better performance
10. Create support documentation and provide all deliverables to the customer
23
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation - Results
End results:
3 different month’s worth of data ingested
1554 nodes created/updated
• 134 Companies
• 1074 Products
• 164 Jira tickets
• 12 Software Categories
• 166 End of Support date clusters
24
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Prototype Implementation – Analysis Examples
Product End of Support Clusters Jira Ticket Clusters
Previously unrealized/unseen clusters of data!
25
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Implementation Prototype - Challenges
• Data source (spreadsheet) was created from 3 separate databases (multiple data elements in each
row)
• Getting the “magic decoder” to understand what each column actually meant and was used for (only a
small fraction of the columns were actually used for the prototype)
• Determining the best way to parse the spreadsheet (single path with refinements vs. multiple passes)
• A single pass method was used and then subsequent passes through the nodes to further refine
the graph (including cleanup of temporary properties in the Product node)
• Use of an End_of_Service node for performance and analysis vs. parsing the EndofService date in
each node
• Refactoring the initial ingest script to handle updates rather than using 2 separate scripts
• Use of a dateCreated and dateUpdated property when performing the MERGE operation with the
nodes (e.g.. Set the dateCreated with the ON CREATE option and set the dateUpdated with the
ON MERGE option)
• Documenting the model, use cases (with examples) and installation instructions for novice users
26
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Graph Database Techniques to Reduce Risk and Innovate
Future Considerations -
Relationship to a Software Bill of
Materials (SBOM)
27
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Future Considerations - Relationship to a Software Bill of
Materials (SBOM)
What is an SBOM?
An SBOM is a formal record containing the details and supply chain relationships of various
components used in building software. In addition to establishing these minimum elements, this report
defines the scope of how to think about minimum elements, describes SBOM use cases for greater
transparency in the software supply chain, and lays out options for future evolution.1
Relationships to Graphs:
Depth. An SBOM should contain all primary (top level) components, with all their transitive
dependencies listed. At a minimum, all top-level dependencies must be listed with enough detail to
seek out the transitive dependencies recursively.
Going further into the graph will provide more information. As organizations begin SBOM, depth
beyond the primary components may not be easily available due to existing requirements with
subcomponent suppliers. Eventual adoption of SBOM processes will enable access to additional
depth through deeper levels of transparency at the subcomponent level. It should be noted that some
use cases require complete or mostly complete graphs, such as the ability to “prove the negative” that
a given component is not on an organization’s network.1
1 The Minimum Essential Elements of a Software Bill of Materials, United States Department of Commerce,
July 12, 2021, https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
28
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Future Considerations - Relationship to a Software Bill of
Materials (SBOM)
Relationships to Graphs (cont):
Known Unknowns. For instances in which the full dependency graph is not enumerated in the
SBOM, the SBOM author must explicitly identify “known unknowns.” That is, the dependency data
draws a clear distinction between a component that has no further dependencies, and a component
for which the presence of dependencies is unknown and incomplete...
Other Component Relationships. The minimum elements of SBOM are connected through a single
type of relationship: dependency. That is, X is included in Y. This relationship is implied in the SBOM
graph structure…1
There is a very noticeable correlation between the “minimum” elements in an SBOM and
that used in the software obsolescence graph!
• Candidate for future research and implementation
1 The Minimum Essential Elements of a Software Bill of Materials, United States Department of Commerce,
July 12, 2021, https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
29
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Questions
30
Graph Database Techniques to Reduce Risk and Innovate
© 2022 Carnegie Mellon University
[[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution.
Contact Information
Michael S. Bandor
Senior Software Engineer
Software Engineering Institute (SEI)
Carnegie Mellon University (CMU)
mbandor@sei.cmu.edu
412-268-8423

Mais conteúdo relacionado

Semelhante a 014 Graph Database Techniques to Reduce Risk and Innovate - NODES2022 AMERICAS Beginner 6 - Michael Bandor.pdf

MAYER, James 2015
MAYER, James 2015MAYER, James 2015
MAYER, James 2015
James Mayer
 
Online student management system
Online student management systemOnline student management system
Online student management system
Mumbai Academisc
 
Project Deliverable 5 Infrastructure and SecurityThis assignm.docx
Project Deliverable 5 Infrastructure and SecurityThis assignm.docxProject Deliverable 5 Infrastructure and SecurityThis assignm.docx
Project Deliverable 5 Infrastructure and SecurityThis assignm.docx
woodruffeloisa
 
Csqe sample exam 1 solutions 05.00.04
Csqe sample exam 1   solutions 05.00.04Csqe sample exam 1   solutions 05.00.04
Csqe sample exam 1 solutions 05.00.04
binodrit98
 

Semelhante a 014 Graph Database Techniques to Reduce Risk and Innovate - NODES2022 AMERICAS Beginner 6 - Michael Bandor.pdf (20)

Strategic Enterprise Risk and Data Architecture
Strategic Enterprise Risk and Data ArchitectureStrategic Enterprise Risk and Data Architecture
Strategic Enterprise Risk and Data Architecture
 
MIT520 software architecture assignments (2012) - 1
MIT520   software architecture assignments (2012) - 1MIT520   software architecture assignments (2012) - 1
MIT520 software architecture assignments (2012) - 1
 
What's Next with Government Big Data
What's Next with Government Big Data What's Next with Government Big Data
What's Next with Government Big Data
 
EP Project Charter OpenWells2.3
EP Project Charter OpenWells2.3EP Project Charter OpenWells2.3
EP Project Charter OpenWells2.3
 
CAST Architecture Checker
CAST Architecture CheckerCAST Architecture Checker
CAST Architecture Checker
 
Mayer, james 2015
Mayer, james 2015Mayer, james 2015
Mayer, james 2015
 
MAYER, James 2015
MAYER, James 2015MAYER, James 2015
MAYER, James 2015
 
Performance Testing Cloud-Based Systems
Performance Testing Cloud-Based SystemsPerformance Testing Cloud-Based Systems
Performance Testing Cloud-Based Systems
 
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
4.2_Microgrid Design Toolkit_Eddy_EPRI/SNL Microgrid
 
How to add security in dataops and devops
How to add security in dataops and devopsHow to add security in dataops and devops
How to add security in dataops and devops
 
Abbas
AbbasAbbas
Abbas
 
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
Adopting Cloud Testing for Continuous Delivery, with the premier global provi...
 
Howell_2022 Defense Standardization Program (DSP) Conference Presentation_fin...
Howell_2022 Defense Standardization Program (DSP) Conference Presentation_fin...Howell_2022 Defense Standardization Program (DSP) Conference Presentation_fin...
Howell_2022 Defense Standardization Program (DSP) Conference Presentation_fin...
 
Case Study: Practical tools and strategies for tackling legacy practices and ...
Case Study: Practical tools and strategies for tackling legacy practices and ...Case Study: Practical tools and strategies for tackling legacy practices and ...
Case Study: Practical tools and strategies for tackling legacy practices and ...
 
Online student management system
Online student management systemOnline student management system
Online student management system
 
IRJET- An Sla-Aware Cloud Coalition Formation Approach for Virtualized Networks.
IRJET- An Sla-Aware Cloud Coalition Formation Approach for Virtualized Networks.IRJET- An Sla-Aware Cloud Coalition Formation Approach for Virtualized Networks.
IRJET- An Sla-Aware Cloud Coalition Formation Approach for Virtualized Networks.
 
Project Deliverable 5 Infrastructure and SecurityThis assignm.docx
Project Deliverable 5 Infrastructure and SecurityThis assignm.docxProject Deliverable 5 Infrastructure and SecurityThis assignm.docx
Project Deliverable 5 Infrastructure and SecurityThis assignm.docx
 
Csqe sample exam 1 solutions 05.00.04
Csqe sample exam 1   solutions 05.00.04Csqe sample exam 1   solutions 05.00.04
Csqe sample exam 1 solutions 05.00.04
 
project plan
project planproject plan
project plan
 
Creating a Step Change in Cyber Security | ISCF DSbD Business-led Demonstrato...
Creating a Step Change in Cyber Security | ISCF DSbD Business-led Demonstrato...Creating a Step Change in Cyber Security | ISCF DSbD Business-led Demonstrato...
Creating a Step Change in Cyber Security | ISCF DSbD Business-led Demonstrato...
 

Mais de Neo4j

Mais de Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Último

Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

014 Graph Database Techniques to Reduce Risk and Innovate - NODES2022 AMERICAS Beginner 6 - Michael Bandor.pdf

  • 1. 1 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Software Engineering Institute Carnegie Mellon University Pittsburgh, PA 15213 Graph Database Techniques to Reduce Risk and Innovate NODES 2022 Conference Michael Bandor
  • 2. 2 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Document Markings Copyright 2022 Carnegie Mellon University. This material is based upon work funded and supported by the Department of Defense under Contract No. FA8702-15-D-0002 with Carnegie Mellon University for the operation of the Software Engineering Institute, a federally funded research and development center. References herein to any specific commercial product, process, or service by trade name, trade mark, manufacturer, or otherwise, does not necessarily constitute or imply its endorsement, recommendation, or favoring by Carnegie Mellon University or its Software Engineering Institute. NO WARRANTY. THIS CARNEGIE MELLON UNIVERSITY AND SOFTWARE ENGINEERING INSTITUTE MATERIAL IS FURNISHED ON AN "AS-IS" BASIS. CARNEGIE MELLON UNIVERSITY MAKES NO WARRANTIES OF ANY KIND, EITHER EXPRESSED OR IMPLIED, AS TO ANY MATTER INCLUDING, BUT NOT LIMITED TO, WARRANTY OF FITNESS FOR PURPOSE OR MERCHANTABILITY, EXCLUSIVITY, OR RESULTS OBTAINED FROM USE OF THE MATERIAL. CARNEGIE MELLON UNIVERSITY DOES NOT MAKE ANY WARRANTY OF ANY KIND WITH RESPECT TO FREEDOM FROM PATENT, TRADEMARK, OR COPYRIGHT INFRINGEMENT. [DISTRIBUTION STATEMENT A] This material has been approved for public release and unlimited distribution. Please see Copyright notice for non-US Government use and distribution. This material may be reproduced in its entirety, without modification, and freely distributed in written or electronic form without requesting formal permission. Permission is required for any other use. Requests for permission should be directed to the Software Engineering Institute at permission@sei.cmu.edu. DM22-1020
  • 3. 3 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Presentation Topics Problem Space Modeling the Problem Prototype Implementation Future Considerations - Relationship to a Software Bill of Materials (SBOM) Questions
  • 4. 4 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Graph Database Techniques to Reduce Risk and Innovate Problem Space
  • 5. 5 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. A US Department of Defense (DoD) customer was struggling to identify the risk associated with software obsolescence in their system Contractor was tracking software information across various databases and providing a spreadsheet every month to the DoD customer • Tendency for organizations tend to use tools such as spreadsheets to track information since they are readily available • A spreadsheet does not provide additional insight into (1) the relationships in the data and (2) any ripple effects. These relationships and ripple effects directly affect any risks the program may be tracking, while others may not be known because of secondary and tertiary relationships in the data. A prototype implementation using Neo4j was created to ingest the contractor provided data to create a knowledge graph to better understand the relationships in the data and manage risk. Background
  • 6. 6 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Problem Space Problem: Tracking COTS, GOTS and FOSS obsolescence issues and vulnerabilities in software intensive systems Questions that need to be asked: • What is affected? • When is the product no longer supported? • What is the immediate impact? • What are the secondary and tertiary dependencies? • Where are the products located/used in the system and environments? • What are the impacts of identified vulnerabilities (CVEs)? • How do we track this?
  • 7. 7 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Problem Space Organizations tend to use whatever is available (typically a spreadsheet) to identify and track the issues Tracking relationships and impacts gets to be a very complicated situation at best Visualization of the impacts is extremely difficult if even possible A Design Structure Matrix (DSM)1 approach would show clusters but not secondary and tertiary dependencies. 1 This is sometimes also referred to as dependency structure matrix, dependency structure method, dependency source matrix, etc. “ https://dsmweb.org/ , https://en.wikipedia.org/wiki/Design_structure_matrix
  • 8. 8 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Graph Database Techniques to Reduce Risk and Innovate Modeling the Problem
  • 9. 9 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Property Graph Representation Why a graph representation? • Everything is naturally connected, networks of people, transactions, supply chains • “Graphs form the foundation of modern data and analytics techniques, with capabilities to enhance and improve user collaboration, Machine Learning models, and explainable Artificial Intelligence.” – Gartner, “Top 10 Tech Trends in Data and Analytics,” 16 Feb 20211 A property graph lets the problem be represented through Nodes and Relationships of the nodes to each other 1 https://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021
  • 10. 10 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Property Graph Representation Nodes* are the entities in the graph. • They represent objects (nouns) • They can hold any number of attributes (key-value pairs) called properties. • Nodes can be tagged with labels, representing their different roles in your domain. • Node labels may also serve to attach metadata (such as index or constraint information) to certain nodes. * The Property Graph Model, https://neo4j.com/developer/graph-database/
  • 11. 11 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Property Graph Representation Relationships* provide directed, named, semantically relevant connections between two node entities (e.g., Employee WORKS_FOR Company). • Relationships connect nodes and represent actions (verbs) • A relationship always has a direction, a type, a start node, and an end node. • Like nodes, relationships can also have properties. In most cases, relationships have quantitative properties, such as weights, costs, distances, ratings, time intervals, or strengths. • Due to the efficient way relationships are stored, two nodes can share any number or type of relationships without sacrificing performance. • Although they are stored in a specific direction, relationships can always be navigated efficiently in either direction. * The Property Graph Model, https://neo4j.com/developer/graph-database/
  • 12. 12 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Property Graph Representation Example
  • 13. 13 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Modeling the Problem Using a product centric approach, what information needs to be tracked? A product has: • Name & Version • Manufacturer (Company) • End of Support Date (EOS) • End of Extended Support Date (EEOS) • End of Life Date (EOL) • Category (COTS/GOTS/FOSS) • Possible dependency on another software product • Runs on an operating system • May have been through a formal obsolescence evaluation process • May have one or more vulnerabilities
  • 14. 14 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Modeling the Problem An operating system has: • Name & Version • Manufacturer • End of Support Date (EOS) • End of Extended Support Date (EEOS) • End of Life Date (EOL) • May have been through a formal obsolescence evaluation process • May have one or more vulnerabilities Operating Systems tend to have a different impact on obsolescence issues and vulnerabilities than other software products Recommend tracking as separate entity either as a node or additional node label
  • 15. 15 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Modeling the Product Defining the nodes (Product, OS, etc.) and the relationships: A Product: • Runs on an OS (RUNS_ON relationship) • May depend on another product (DEPENDS_ON relationship where it exists) • Is created by a company (represented as a node; CREATED_BY relationship) • May have an obsolescence issue (represented as a node; HAS relationship where an obsolescence issue has been identified and is being tracked externally, i.e., Jiratm ticket) • May contain one or more vulnerabilities (represented as nodes; CONTAINS relationship where it exists) • Is installed in one or more environments (represented as nodes; INSTALLED_IN relationship) Jira is a trademark of Atlassian
  • 16. 16 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Modeling the Problem An OS: • Is created by a company (represented as a node; CREATED_BY relationship) • May have an obsolescence issue (represented as a node; HAS relationship where an obsolescence issue has been identified and is being tracked externally, i.e., Jiratm ticket) • May contain one or more vulnerabilities (represented as nodes; CONTAINS relationship where it exists) • Is installed in one or more environments (represented as nodes; INSTALLED_IN relationship) Obsolescence issue is intentionally modeled as a separate node • Contains a link to a Jira ticket • There may be more than one product covered under a ticket • Also contains risk information
  • 17. 17 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Modeling the Problem INSTALLED_IN CONTAINS CONTAINS ENVIRONMENT PRODUCT OS VULNERABILITY Company Name: OBS Issue
  • 18. 18 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Graph Database Techniques to Reduce Risk and Innovate Prototype Implementation
  • 19. 19 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Possible Use Case Scenarios 1. A manufacturer announces its software product will no longer be supported next year. What are the impacts to the system? 2. A vulnerability has been identified in a software product. Where is it installed in the system in order to determine the risk? 3. A company sells a product line to a foreign company. Where is this product in the system and what is the impact? 4. A new environment is going to be established to support product testing. What products need to be on that environment? 5. A newer version of a product is now available. Is there anything keeping the upgrade from happening (dependencies)?
  • 20. 20 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Prototype Implementation
  • 21. 21 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Prototype Implementation Goals of the implementation: • Use the existing data source (spreadsheet) from the contractor • Government customer would ingest the spreadsheet each month • Government customer would perform the analysis • Provide installation instructions, model details and documentation, use case scenarios with examples and Cypher scripts, primary script for initial data ingest and updates • Make it as easy to use as possible
  • 22. 22 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Prototype Implementation Implementation Steps: 1. Data analysis and understanding of how the data was being used 2. Develop and refine the model 3. Conduct small, test sample imports to get the logic correct 4. Based on prior tests, build the entire ingest script to ingest the spreadsheet data using APOC 5. Determine how to handle data entry errors (e.g., empty fields that are used as indexes, typographical errors in some data elements, date representation, etc.) 6. Confirm the import and subsequent graph was correct – refine import script and model were needed 7. Implement revisions to the import script to handle subsequent updates (e.g., flagging new nodes and updated node data) 8. Perform analyses using the use case scenarios to prove the correctness of the model and assumptions 9. Tune the script and graph model for better performance 10. Create support documentation and provide all deliverables to the customer
  • 23. 23 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Prototype Implementation - Results End results: 3 different month’s worth of data ingested 1554 nodes created/updated • 134 Companies • 1074 Products • 164 Jira tickets • 12 Software Categories • 166 End of Support date clusters
  • 24. 24 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Prototype Implementation – Analysis Examples Product End of Support Clusters Jira Ticket Clusters Previously unrealized/unseen clusters of data!
  • 25. 25 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Implementation Prototype - Challenges • Data source (spreadsheet) was created from 3 separate databases (multiple data elements in each row) • Getting the “magic decoder” to understand what each column actually meant and was used for (only a small fraction of the columns were actually used for the prototype) • Determining the best way to parse the spreadsheet (single path with refinements vs. multiple passes) • A single pass method was used and then subsequent passes through the nodes to further refine the graph (including cleanup of temporary properties in the Product node) • Use of an End_of_Service node for performance and analysis vs. parsing the EndofService date in each node • Refactoring the initial ingest script to handle updates rather than using 2 separate scripts • Use of a dateCreated and dateUpdated property when performing the MERGE operation with the nodes (e.g.. Set the dateCreated with the ON CREATE option and set the dateUpdated with the ON MERGE option) • Documenting the model, use cases (with examples) and installation instructions for novice users
  • 26. 26 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Graph Database Techniques to Reduce Risk and Innovate Future Considerations - Relationship to a Software Bill of Materials (SBOM)
  • 27. 27 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Future Considerations - Relationship to a Software Bill of Materials (SBOM) What is an SBOM? An SBOM is a formal record containing the details and supply chain relationships of various components used in building software. In addition to establishing these minimum elements, this report defines the scope of how to think about minimum elements, describes SBOM use cases for greater transparency in the software supply chain, and lays out options for future evolution.1 Relationships to Graphs: Depth. An SBOM should contain all primary (top level) components, with all their transitive dependencies listed. At a minimum, all top-level dependencies must be listed with enough detail to seek out the transitive dependencies recursively. Going further into the graph will provide more information. As organizations begin SBOM, depth beyond the primary components may not be easily available due to existing requirements with subcomponent suppliers. Eventual adoption of SBOM processes will enable access to additional depth through deeper levels of transparency at the subcomponent level. It should be noted that some use cases require complete or mostly complete graphs, such as the ability to “prove the negative” that a given component is not on an organization’s network.1 1 The Minimum Essential Elements of a Software Bill of Materials, United States Department of Commerce, July 12, 2021, https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
  • 28. 28 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Future Considerations - Relationship to a Software Bill of Materials (SBOM) Relationships to Graphs (cont): Known Unknowns. For instances in which the full dependency graph is not enumerated in the SBOM, the SBOM author must explicitly identify “known unknowns.” That is, the dependency data draws a clear distinction between a component that has no further dependencies, and a component for which the presence of dependencies is unknown and incomplete... Other Component Relationships. The minimum elements of SBOM are connected through a single type of relationship: dependency. That is, X is included in Y. This relationship is implied in the SBOM graph structure…1 There is a very noticeable correlation between the “minimum” elements in an SBOM and that used in the software obsolescence graph! • Candidate for future research and implementation 1 The Minimum Essential Elements of a Software Bill of Materials, United States Department of Commerce, July 12, 2021, https://www.ntia.doc.gov/files/ntia/publications/sbom_minimum_elements_report.pdf
  • 29. 29 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Questions
  • 30. 30 Graph Database Techniques to Reduce Risk and Innovate © 2022 Carnegie Mellon University [[DISTRIBUTION STATEMENT A] Approved for public release and unlimited distribution. Contact Information Michael S. Bandor Senior Software Engineer Software Engineering Institute (SEI) Carnegie Mellon University (CMU) mbandor@sei.cmu.edu 412-268-8423