As the adoption of AI technologies increases and matures, the focus will shift from exploration to time to market, productivity and integration with existing workflows. Governing Enterprise data, scaling AI model development, selecting a complete, collaborative hybrid platform and tools for rapid solution deployments are key focus areas for growing data scientist teams tasked to respond to business challenges. This talk will cover the challenges and innovations for AI at scale for the Industires such as Healthcare and Automotive , the AI ladder and AI life cycle and infrastructure architecture considerations.
5. 5
Analytics Modernization: From Data to Actions
010101010101010111100010011001010111
0000000000010101010100000000000 111101011
11000 000000000000 111111 010101 101010 10101010100
Prescriptive
What should
we do ?
Descriptive
What Has
Happened?
Cognitive
Learn
Dynamically
Predictive
What Will
Happen?
ACTION
DATA
HUMAN INPUTS
<
< >
< >
>
>
delivering faster insights with greater efficiency to impact more lives
6. Three broad categories of AI Use Cases
“Structured” Data Use Cases
Computer Vision Use Cases
- Big Data (Rows and Columns)
- Available AI Software More Accuracy !
This is sort of “Magic”
- a deep learning Model is trained to detect and classify objects
Natural Language Processing Use Cases
- A Model learns to read, hear and “understand” language
9. A framework for designing, deploying, growing and optimizing infrastructure for HPC, AI and Cloud, created in
collaboration with world’s leading healthcare and life sciences institutions, and using Red Hat OpenShift, IBM
Power Systems, IBM Storage and open API endpoints.
From Data to Insight with an Optimal Reference Architecture
DATAHUB
High Performance Data Fabric & Catalog
Capable of Handling Exabytes of Data
and Trillions of Objects
ORCHESTRATION
High Performance Computing & AI
Platform Capable of Orchestrating
Thousands of Servers and GPUs
APPS & MODELS
Large-scale and high-throughput
workloads such as HPC, AI and Cloud
computing
MEDICAL TASKS
Genomics, molecular simulation,
structural analysis, diagnostics, data
fusion, manufacturing quality inspection.
10. 10
Smart loves problems, and there has never been a bigger
problem facing our world.
Biomolecular Structure
Molecular Simulation
Genomics Medical Diagnostics AI
Data Fusion and AI
Bio-Informatics
Artificial intelligence and high-performance computing have already begun to attack the
virus, assisting in molecular drug discovery, genomics and medical image processing.
11. Data
Overload
Oceans of data
arise from rapid
digitization and
instrumentation
of healthcare.
App Chaos
Thousands of
applications,
workflows and
models are not
all following the
same rules.
Adoption
Vertically
integrated
toolsets with
heavy
customization
and vendor lock-
in create work
silos.
Performance
When scaling up
or out, most
institutions
cannot diagnose
or analyze the
performance
problems they
face.
Cost
Demanding
workloads
require well-
orchestrated
infrastructure to
manage, monitor
and control
costs.
Five key challenges to progress remain despite advances
16. Optimizing Medical Imaging
Enhance image identification with deep learning
to assist physicians and benefit patients
1300 MRI images trained by IBM Power
Systems and IBM Storage in just two hours,
compared to forty hours on traditional
architectures
18. 18
Advances in instrument
design, sample preprocessing
and mathematical methods
have enabled high volume
throughput imaging at atomic
scale.
Cryogenic electron
microscopes generate an
average of 5 TB of image data
per day
BIOMOLECULAR STRUCTURE
Massive Data Sets Require Massive Processing Capability
19. Accelerating Cryo-EM Imaging Analysis
Reduced time-to-completion for high resolution image
analysis jobs while increasing resource utilization
Using IBM AC922 cluster, more than 100 cryo-EM
high resolution image workload analysis jobs running
in parallel on Satori cluster
BIOMOLECULAR STRUCTURE
20. Simulation of millions of atoms requiring large computational
resources
Large scale simulation includes millions of
atoms
• Virus molecules
• Ribosomes
• Bioenergy system and complex
Solution
• High performance computing CPU and
GPUs accelerating performance
• Optimal memory and network bandwidths
scaling performance to hundreds of nodes
• Techniques to reduce number of simulations
Receptor
ligand
Virus molecule simulation Receptor-ligand fit
Cryptic binding site prediction Binding energy prediction
MOLECULAR SIMULATION
21. Molecular Dynamics Simulation Computational Intensity
A) Using NAMD to simulate influenza
B) virus (left)and Covid-19 (right)
B) Drug discovery:
protein receptor
C) In silico prediction of protein cryptic binding site D) Predicting protein receptor
ligand binding energy
Receptor
ligand
Large scale simulation
includes millions of atoms
• Virus molecules
• Ribosomes
• Bioenergy system and complex
Solution
• High performance computing
CPU and GPUs accelerating
performance
• Optimal memory and network
bandwidths scaling performance
to hundreds of nodes
• Techniques to reduce number of
simulations
22. Bayesian optimization
accelerated workflow
uses 1/3 of the
calculations to achieve 4
orders of magnitude
resolution increase
Optimizing Molecular Modeling
Achieves human level
performance in days
instead of months.
Accelerated Force Field Tuning Intelligent Phase Diagram Exploration
23. Faster
Better Cheaper
BOA accelerates
time to insight, time
to value, and time to
design by factors
Example:
IBM EDA ->100x faster
than brute force
BOA can find new and
unknown optima in a
design space because of
its lack of bias and
exploration algorithm
Example:
Infineon – 3x faster than
other methods and
4 orders of magnitude
better resolution
Nothing is cheaper than a
simulation which is never
run. BOA prevents
unnecessary work which
reduces all kinds of costs
Example:
GlaxoSmithKline –
reduced their screen
workload from 20k
experiments to 200
IBM
BOA
Bayesian Optimization Value
0 200 400 600 800
BOA
Greedy
Similarity
Diversity
count
Search Method Comparison
Drug Discovery Case - Single
Objective
All Data / Ties removed
Conclusion: >80% of the
time IBM BOA is the best
method with the least regret
24. Speed time to value, with pre-built AI apps and learnings from
thousands of AI engagements
24
Cognitive car manual
explaining increased
vehicle complexity
Generate new data
driven revenue stream
with contextual
connected services
Improved Customer
Safety and Recall
Engagement
Delivering superior quality using
AI and edge computing
Streamlining the recruitment
process and saving 40
percent of time needed in
application handling
Connected
Vehicle / AD
Risk &
Compliance
Customer
Experience
Manufacturing
IT
Operations
25. Designing a Formula 1 car is complex.
Validating component design is crucial,
but testing aerodynamics, either
physically or by simulation, is costly.
32. Data Science Exploration
to Production
Use Case Exploration
Data Science Model Build
Use Case Deployment in Production
Requires solution architecture
Deploy
Source: https://papers.nips.cc/paper/5656-hidden-technical-debt-in-machine-learning-systems.pdf
Use Case Exploration
Data Science Model Build
Security, Privacy and Governance
36. Metadata-Fueled Data Analysis
Large Scale Data Ingest
• Scan records at high speed
• Live event notifications
• Capture system-level tags
• Automatic indexing
Business-Oriented
Data Mapping
• Custom data tagging
• Content-inspection via APIs
• Policy-driven workflows
Data Activation
• Data movement via APIs
• Extensible architecture
• Solution Blueprints
Data Visualization
• Query billions of records
in seconds
• Multi-faceted search
• Drilldown dashboard
• Customizable reports
37. The Data: Biological Data Analytics
Biological
Data Analysis
Biomarker
Identification
Biodata
modeling and
Statistical
Analysis
Biodata
Visualization
Medical Images
Data analysis
Structural
Bioinformatics
Genomics
Sequence data
analysis
Biological Data Analytics
q Genomic Sequence Data: an explosive growth of biodata
q Sequence alignment
q Variant discovery and characterization
q Genomic profiling and pattern discovery
q Biomarker Identification: gene expression profile, RNA-seq, ChIP-
seq, microarray identification and validation, etc.
q Structural Bioinformatics: identify and predict 3D biomolecule
structures, such Cryo-EM data refinement, molecular dynamic
simulation, NMR, x-Ray crystallographic data, etc.
q Biodata Modeling & Statistical Analysis: biological pathways
analysis, Gene, clinical data cohorts study, data extraction, etc.
q Medical Image Processing: image segmentation, registration,
statistic modeling.
q Biodata Visualization: 3D molecule structures, genomics sequences
visualization, etc.
Ruzhu Chen @ 2019
39. Optimizing Precision Genomics
Reduced time-to-completion for long-running
jobs while increasing resource utilization
Using IBM, Sidra has completed hundreds of
thousands of computing tasks comprising
millions of files and directories, without
experiencing system downtime.
41. OpenPOWER is a technical community
dedicated to expanding the the IBM Power architecture ecosystem
https://github.com/open-ce
Open-CE
Minimize time to value for
foundational ML/DL packages
Provide a flexible source-to-image
solution to provide a complete and
customizable AI environment.
42. Anaconda Environment for Applications
• Use anaconda enterprise network
(AEN) to manage cryo-EM software
repository on server.
• Easy to use and update software
Anaconda Architecture for Cryo-EM Analysis
Computation
Web Interface
Repo Install
Software
Control
Authentication
Anaconda Server
Compute Nodes
Database Users
43.
44. Data Data Data
Microservices Containerized Workloads Multicloud Provisioning
Public Cloud
On-prem
ises
An architecture of loosely coupled
data services, easily refactored to
create containerized workloads
Stand-alone workloads composed of
microservices & data that are flexibly
deployed, orchestrated and managed
Agile provisioning of containerized
workloads in multicloud environments
and consumption of cloud services
Cloud Native Platforms
Agility o Efficiency o Cost Savings
IBM Cloud Pak for Data
45. Data Pipeline -The data that is feed into models has to be cleaned and structured to
produce accurate results
Real-Time (vs Batch) - Many AI applications have response times in milli-seconds and
in many cases have 100K+ IOT events per second (Latency, Latency, Latency)
Scalability - Ability to scale inference engine and manage infrastructure
Security - Applications running AI models in the field and back-offices
Multi-Tenancy - Multiple business applications leveraging shared infrastructure,
Multiple Models per Business Application
Tools Proliferation - Analytics, Data/Object Tagging, Model Training and Inferencing
Model Management - Continuous Training/Re-Training of Models, AI-DevOps, Ease of
Deployment
Transparency - Ability to explain decisions
A
C
C
U
R
A
C
Y
Typical AI Inferencing Considerations