BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
Graph Hardware Architecture - Enterprise graphs deserve great hardware!
1. 1
Graph Hardware Architecture
Enterprise graphs deserve great hardware!
Dan McCreary, Distinguished Engineer, Optum
Nikhil Deshpande, Director, AI & HPC, Intel Corp.
Graph + AI World 2020
September 29, 2020
2. About Dan
2
• Distinguished Engineer at Optum Healthcare
(330K employees and 32K technical staff)
• Focused on AI enterprise knowledge graphs
• Help create the worlds largest healthcare
graph
• Coauthor of book "Making Sense of NoSQL”
• Worked at Bell Labs as a VLSI circuit designer
• Worked for Steve Jobs at NeXT
3. Clinical Data Fits Well With Graph Database Models
3
Clinical
Data
Graph
• There is a natural fit between clinical data and graph technologies
• Clinical data is both complex and shows high-variability
• Graphs are ideal for storing complex and highly-variable data
4. Business Impact of Moving from RDBMS to Graph
4
Relational Graph Business Impact of Using Graph for
Clinical Data
Relationships calculated at
query time
Relationships calculated at
load time
Relationship-intensive clinical data is
queried faster
Fixed data model Flexible data model It is easier to add new clinical data to a
graph at any time
Limited analytical algorithms Rich library of analytical
algorithms such as pathfinding,
PageRank, clustering, random
walk and similarity
It is easy to execute patient clustering,
similarity and care path recommendation
engines
Difficult to build custom
hardware to optimize
algorithms
Easy to build custom hardware
to optimize algorithms
Hardware for optimizing algorithms is
low cost
5. The Real Time Clinical Decision Support Challenge
5
• A new patient walks into the ER, urgent care or our clinic
• We gather patient data into the Electronic Medical Record (EMR)
• How quickly can we compare this patient with the best outcomes of 250
million other patients? Goal: 200 msec
250M Patients
Patient Data
Recommended
Care Path
6. How We Got Here: Pointer Hopping
• Several years ago we started working on graph scaling
problems
• Graph traversal is they key function that must scale in
parallel
• Graph traversal is all about pointer hopping
• Having a VLSI circuit design background help me
visualize how narrow the instruction sets were needed to
do this quickly
• I realized that both the instruction set and the memory
access patterns were not optimized with modern
general purpose CISC hardware
• We needed a new distributed graph architecture with
many cores and faster memory access patterns
6
7. Example: Learning Embeddings Via Random Walk
• NLP technologies have shown us that we can
create low-dimensional embeddings for every
word by looking at the sequence of words in text.
• Using Random Walk algorithms we can create a
similar sequence of items to create embeddings
for primary Vertices (Customer, Product etc.)
• We can use these embeddings for 50 msec
comparisons over 100M items if we do the
calculations in parallel
• Random Walks are very CPU intensive and
require traversal of hundreds of millions of
edges
7
Embedding: 200 32-bit integer
17. Conclusion: Think BIG about your EKG!
• We encourage everyone to continue to
“think big” when doing strategic planning
around your Enterprise Knowledge Graphs
• Many organizations besides Google,
Amazon, Facebook etc. will be building 100B
to 1TB vertex graphs in the next few years to
better serve their customers
• New hardware will soon be arriving that will
dramatically accelerate today’s graph
queries
• Real time product recommendation systems
that take complex content into account will
be cost effective a accelerate data-driven
decision making in healthcare and other
industries
17