SlideShare uma empresa Scribd logo
1 de 105
Baixar para ler offline
What’s Next in
computing & the
role of cloud FPGAs
Dr. Dionysios Diamantopoulos
Research Staff Member,
Cloud FPGAs & Tape Group,
Cloud & AI Systems Research Department,
IBM Research Europe
Guest lecture at the Harokopio University of Athens, as part of the MSc program
of the Informatics & Telematics Department, invited by Prof. Sotiris Xydis.
21 Jan. 2021
IBM Legal Disclaimer
This content was provided for informational purposes only. The opinions and insights discussed are
those of the presenter and guests and do not necessarily represent those of the IBM Corporation.
Nothing contained in these materials or the products discussed is intended to, nor shall have the effect
of, creating any warranties or representations from IBM or its suppliers, or altering the terms and
conditions of any agreement you have with IBM.
The information presented is not intended to imply that any actions taken by you will result in any
specific result or benefit and should not be relied on in making a purchasing decision. IBM does not
warrant that any systems, products or services are immune from, or will make your enterprise immune
from, the malicious or illegal contact of any party.
All product plans, directions and intent are subject to change or withdrawal without notice. References
to IBM products, programs or services do not imply that they will be available in all countries in which
IBM operates. IBM, the IBM logo, and other IBM products and services are trademarks of the
International Business Machines Corporation, in the United States, other countries or both. Other
company, product, or services names may be trademarks or services marks of others.
For copyright and trademark information go to: http://www.ibm.com/legal/us/en/copytrade.shtml
2
Beijing
Tokyo
Shin-Kawasaki
Delhi
Bangalore
Singapore
Nairobi
Haifa
Zurich
Warrington
Dublin
Cambridge
Albany
Yorktown
Almaden
Rio de Janeiro
Sao Paulo Johannesburg
Melbourne
3000
Researchers
19
Locations
6
Continents
6 Nobel Laureates
10 Medals of Technology
5 National Medals of Science
6 Turing Awards
3
A legacy of world-class research
For 75 years, IBM Research has been propelling innovation for IBM, from the first
programmable computers to the quantum computers of today. More than anything,
our goal is to catalyze and drive the advancements that shape our world.
With more than 3,000 researchers across the globe, we are anticipating, examining, and
inventing What’s Next in science and technology every single day.
2019 IBM Project Debater
2018 Summit and Sierra: World’s Fastest Supercomputers
2017 Commercial Quantum Computing
2016 World’s first quantum computer on the cloud
2015 Watson Genomic Analytics for Personalized Cancer Treatment
2014 SyNAPSE: Biologically Inspired Neural Architecture
2013 Antimicrobial Polymers
2012 Atomic Imaging (Charge Distribution, Bond Order)
2011 Watson Wins Jeopardy!
2009 Nanoscale Magnetic Resonance Imaging (MRI)
2008 World’s First Petaflop Supercomputer
2007 Web-scale Mining
2005 Cell Broadband Engine
2004 Blue Gene/L
2003 5 Stage Carbon Nanotube Ring Oscillator
2000 Java Performance
1998 Silicon on Insulator (SOI)
1997 Copper Interconnect Wiring
1997 Deep Blue
1994 Silicon Germanium (SiGe)
1990 Chemically Amplified Photoresists
1987 High-Temperature Superconductivity (Nobel Prize)
1986 Scanning Tunneling Microscope (Nobel Prize)
1980 Reduced Instruction Set Computing (RISC)
1979 Thin Film Recording Heads
1973 Winchester Disk Drive
1971 Speech Recognition
1970 Relational Database
1967 Fractals
1966 One-Device Memory Cell
1957 FORTRAN
1956 Random Access Memory Accounting Machine (RAMAC)
4
© 2020 IBM Corporation
IBM Research Europe
5
Dublin Daresbury
Hursley Zurich
Daresbury
Zurich
5
IBM Research – Zurich
Established in 1956
45+ different nationalities
Open Collaboration:
o Horizon2020: 50+ funded projects
and 500+ partners
Two Nobel Prizes:
o 1986: Nobel Prize in Physics for
the invention of the scanning
tunneling microscope by Heinrich
Rohrer and Gerd K. Binnig
o 1987: Nobel Prize in Physics for
the discovery of high-temperature
superconductivity by K. Alex
Müller and J. Georg Bednorz
European Physical Society Historic Site
Binnig and Rohrer Nanotechnology
Centre (Public Private Partnership with
ETH Zürich and EMPA)
7 European Research Council Grants
My office
6
# who am I
v.0.1
1985 2009
Ph.D. @ ECE, NTUA.
“Cross-Layer Rapid Prototyping and
Synthesis of Application-Specific and
Reconfigurable Many-accelerator
Platforms”
2015
Military service
IT Engineer @
Hellenic Army
General Staff
2016
R&D Engineer,
Startup, LN2
2016 2017
Postdoc Researcher,
Heterogeneous Cognitive
Computing Systems Group,
Cloud & Computing
Infrastructure Department,
IBM Research – Zurich,
“Transprecision Computing”
PhD Researcher and R&D engineer in
ESA, EU and national funded projects
Postdoc Researcher,
Cloud FPGAs and Tape
Group, Cloud & AI Systems
Research Department,
IBM Research Europe,
“Transprecision Computing”,
“Near Memory Computing”,
“cloudFPGA”
2019 2021
Research Scientist,
Cloud FPGAs and Tape Group,
Cloud & AI Systems Research
Department,
IBM Research Europe
Not necessarily
linear scale
Time is relative
(to your frame of
reference)
Childhood & school @ Pylos, Greece
Met Prof. Sotirios Xydis
Enjoying our collaboration and
friendship thereafter
D.Eng. @ CEID, Univ.Patras
“Design and Implementation of a
dual-processor (RISC) System-on-
Chip targeting machine vision
algorithms on FPGAs and eASICs”
7
We’re Inventing What’s Next in:
Hybrid Cloud
AI
Quantum
Science 8
IBM’s innovation: Topping the US patent list for 28
years running
https://www.ibm.com/blogs/research/2021/01/ibm-patent-leadership-2020/
From automated teller machine (ATM), speech recognition technology, DRAM to a novel way to search multilingual documents using NLP, 2300 AI patents !
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 9
Maslow's hierarchy of needs:
Basic needs or physiology needs
A bit of motivation
“The basic need is a concept that was derived to explain and cultivate
the foundation for motivation.
This concept is the main physical requirement for human survival. This
means that basic needs are universal human needs. Basic needs, being
primal, are by default, a governor on the attainment of the "higher"
needs.
Efforts to accomplish higher needs may be interrupted temporarily by a
deficit of primal needs, such as a lack of food or air. Basic needs are
considered in internal motivation according to Maslow's hierarchy of
needs.
Maslow's idea is that humans are compelled to fulfill these basic needs
first to pursue intrinsic satisfaction on a higher level.[3] If these needs
are not achieved, it leads to an increase in displeasure within an
individual. In return, when individuals feel this increase in displeasure,
the motivation to decrease these discrepancies increases.”
What’s Next in computing &
the role of cloud FPGAs
Food, Water Health Breathing Rest Warmth
Abraham Harold Maslow was a psychology professor at Alliant International University,
Brandeis University, Brooklyn College, New School for Social Research, and Columbia University.
Quoted text and image source: Wikipedia
Horizontal Needs: Physiological
Vertical
Needs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 10
Maslow's hierarchy of needs:
Safety needs
A bit of motivation
“Once a person's physiological needs are relatively satisfied, their safety
needs to take precedence and dominate behavior.
In the absence of physical safety – due to war, natural disaster, family
violence, childhood abuse, etc. and/or in the absence of economic safety
– (due to an economic crisis and lack of work opportunities) these safety
needs manifest themselves in ways such as a preference for job
security, grievance procedures for protecting the individual from
unilateral authority, savings accounts, insurance policies, disability
accommodations, etc.
This level is more likely to predominate in children as they generally
have a greater need to feel safe. It includes shelter, job security, health,
and safe environments. If a person does not feel safe in an environment,
they will seek safety before attempting to meet any higher level of
survival”.
What’s Next in computing &
the role of cloud FPGAs
Abraham Harold Maslow was a psychology professor at Alliant International University,
Brandeis University, Brooklyn College, New School for Social Research, and Columbia University.
Quoted text and image source: Wikipedia
Vertical
Needs
Physiological Needs
House Security Care
Horizontal Needs: Safety
Financial
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 11
Maslow's hierarchy of needs:
Social needs
A bit of motivation
“After physiological and safety needs are fulfilled, the third level of
human needs is interpersonal and involves feelings of belongingness.
According to Maslow, humans possess an effective need for a sense of
belonging and acceptance among social groups, regardless of whether
these groups are large or small. For example, some large social groups
may include clubs, co-workers, religious groups, professional
organizations, sports teams, gangs, and online communities.
Some examples of small social connections include family members,
intimate partners, mentors, colleagues, and confidants. Humans need to
love and be loved – both sexually and non-sexually – by others.
Many people become susceptible to loneliness, social anxiety, and
clinical depression in the absence of this love or belonging element.
Deficiencies due to hospitalism, neglect, shunning, ostracism, etc. can
adversely affect the individual's ability to form and maintain emotionally
significant relationships in general.”
What’s Next in computing &
the role of cloud FPGAs
Abraham Harold Maslow was a psychology professor at Alliant International University,
Brandeis University, Brooklyn College, New School for Social Research, and Columbia University.
Quoted text and image source: Wikipedia
Vertical
Needs
Physiological Needs
Safety Needs
Friendship Family
Horizontal Needs: Social
Intimacy
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 12
Maslow's hierarchy of needs:
Esteem needs
A bit of motivation
“Esteem needs are ego needs or status needs. People develop a concern
with getting recognition, status, importance, and respect from others.
Most humans need to feel respected; this includes the need to have self-
esteem and self-respect. Esteem presents the typical human desire to
be accepted and valued by others. People often engage in a profession
or hobby to gain recognition. These activities give the person a sense of
contribution or value.
Low self-esteem or an inferiority complex may result from imbalances
during this level in the hierarchy. Psychological imbalances such as
depression can distract the person from obtaining a higher level of self-
esteem.
Most people have a need for stable self-respect and self-esteem.
Maslow noted two versions of esteem needs: a "lower" version and a
"higher" version. This means that esteem and the subsequent levels are
not strictly separated; instead, the levels are closely related.”
What’s Next in computing &
the role of cloud FPGAs
Abraham Harold Maslow was a psychology professor at Alliant International University,
Brandeis University, Brooklyn College, New School for Social Research, and Columbia University.
Quoted text and image source: Wikipedia
Vertical
Needs
Physiological Needs
Safety Needs
Social Needs
Recognition Trust
Horizontal Needs: Esteem
Respect
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 13
Maslow's hierarchy of needs:
Self-actualization needs
A bit of motivation
“This level of need refers to the realization of one's full potential.
Maslow describes this as the desire to accomplish everything that one
can, to become the most that one can be. People may have a strong,
particular desire to become an ideal parent, succeed athletically, or
create paintings, pictures, or inventions.
To understand this level of need, a person must not only succeed in the
previous needs but master them. Self-actualization can be described as
a value-based system when discussing its role in motivation. Self-
actualization is understood as the goal or explicit motive, and the
previous stages in Maslow's Hierarchy fall in line to become the step-by-
step process by which self-actualization is achievable; an explicit motive
is the objective of a reward-based system that is used to intrinsically
drive completion of certain values or goals. Individuals who are
motivated to pursue this goal seek and understand how their needs,
relationships, and sense of self are expressed through their behavior.
Self-actualization can include: Partner Acquisition, Parenting, Utilizing &
Developing Talents & Abilities, Pursuing goals.”
What’s Next in computing &
the role of cloud FPGAs
Abraham Harold Maslow was a psychology professor at Alliant International University,
Brandeis University, Brooklyn College, New School for Social Research, and Columbia University.
Quoted text and image source: Wikipedia
Vertical
Needs
Physiological Needs
Safety Needs
Social Needs
Esteem Needs
Parenting Goals
Horizontal Needs: Self-actualization
Talents
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 14
Maslow's hierarchy of needs:
Transcendence needs
A bit of motivation
“In his later years, Abraham Maslow explored a further dimension of
motivation, while criticizing his original vision of self-actualization.
By these later ideas, one finds the fullest realization in giving oneself to
something beyond oneself—for example, in altruism or spirituality. He
equated this with the desire to reach the infinite.
Transcendence refers to the very highest and most inclusive or holistic
levels of human consciousness, behaving and relating, as ends rather
than means, to oneself, to significant others, to human beings in general,
to other species, to nature, and to the cosmos” Maslow 1971, p. 269
What’s Next in computing &
the role of cloud FPGAs
Abraham Harold Maslow was a psychology professor at Alliant International University,
Brandeis University, Brooklyn College, New School for Social Research, and Columbia University.
Quoted text and image source: Wikipedia
Vertical
Needs
Physiological Needs
Safety Needs
Social Needs
Esteem Needs
Self-actualization Needs
Transcendence
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 15
Computing hierarchy of needs:
Physical needs
A bit of motivation
What’s Next in computing &
the role of cloud FPGAs
Information
Representation Materials Power Thermal
Horizontal Needs: Physiological
Vertical
Needs
. . .
Claude Shannon
The origins of information theory
Image source: Wikipedia
He is the founder of digital circuit design
theory when, in 1937, he wrote his
thesis demonstrating that electrical
applications of Boolean algebra could
construct any logical numerical
relationship.
Assumption: separation of information
from physics -> that separation is being
“challenged” by quantum computing
today.
0
1
Prior to Shannon those things had nothing in
common.
Today we get to see them both as processors
or carriers of information.
12-row/80-column IBM punched card from the mid-
twentieth century, Image source: Wikipedia
A section of DNA. The bases lie vertically between
the two spiraling strands, Image source: Wikipedia
0+1
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 16
Computing hierarchy of needs: Technological needs
A bit of motivation
What’s Next in computing &
the role of cloud FPGAs
Horizontal Needs: Technological
Vertical
Needs
. . .
Physical Needs
Transistor Variability Yield
Aging
. . .
H. -. P. Wong et al., "A Density Metric for Semiconductor Technology [Point of View]," in Proceedings of the IEEE, vol. 108, no. 4, pp. 478-
482, April 2020, doi: 10.1109/JPROC.2020.2981715.
Moore’s Law End ? Really ?
— “medium-K, oxide-minimized, semi-strained, anti-dielectric half-pitch.” ?
Transistor scaling
Intel, IEDM 2019,
Germanium-based
GAAFET PMOS
device layer on top
of a more traditional
silicon FinFET NMOS
System scaling: Beyond the
transistor, e.g. Intel’s EMIB
(Embedded Multi-die
Interconnect Bridge) and
Foveros to connect chiplets
in both 2 and 3 dimensions
(HBM in CPU-GPU)
• 5 chipmakers/foundries in the 16nm/14nm market—GlobalFoundries, Intel,
Samsung, TSMC UMC, SMIC (14nm finFETs).
• GlobalFoundries and UMC last year halted their respective 7nm process
efforts.
• Currently, TSMC's 7nm process is in its peak (orders from AMD for its Ryzen
3000-series CPUs and Navi graphics cards). Huge invest in 5nm.
• Compared to 7nm, Samsung’s 5nm finFET technology provides up to a 25%
increase in logic area with 20% lower power or 10% higher performance.
• TSMC expects mass 3nm production in 2022.
• A nanosheet FET is a type of gate-all-around (GAA) architecture. That’s not
the only possible scenario. “The industry is very conservative. They will try to
extend the finFET as much as possible,” IMEC’s Naoto Horiguchi said. “At
3nm, we have a window to use a finFET. But we need several process
innovations for finFET in terms of overall improvement.
• TSMC announced starting 2nm development (Apr. 2020) https://semiengineering.com/5nm-vs-3nm/ The Future of Computing: Bits + Neurons + Qubits, Dario Gil and
William M. J. Green, ISSCC2020: Plenary
https://semiwiki.com/eda/synopsys/294205-what-might-the-1nm-
node-look-like/
Nadine Collaert, "1.3 Future Scaling: Where Systems and Technology
Meet," 2020 IEEEInternational Solid- State Circuits Conference - (ISSCC),
San Francisco, CA, USA, 2020, pp. 25-29, doi:
10.1109/ISSCC19947.2020.9063033.
o “10 micron” in 1972 through “0.35 micron” in 1995, an impressive 23-year run where the node name matched gate length.
o Then, in 1997 with the “0.25 micron/250 nm” node they started over-achieving with an actual Lg of 200 nm – 20% better
than the name would imply.
o This “sandbagging” continued through the next 12 years, with one node (130nm) having Lg of only 70nm – almost a 2x
buffer. Then, in 2011, Intel jumped over to the other side of the ledger, ushering in what we might call the “overstating
decade” with the “22nm” node sporting an Lg of 26 nm. Since then, things have continued to slide further in that direction,
with the current “10nm” node measuring in with an Lg of 18 nm – almost 2x on the other side of the “named” dimension.
o Most industry folks understand that Intel’s “10nm” process is roughly equivalent to TSMC and Samsung’s “7nm” processes.
https://www.eejournal.com/article/no-more-nanometers/ July 23, 2020
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
Moore’s Law End ? Really ?
— I prefer to respect that it is aging fairly gracefully!
Transistor scaling Cost scaling
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
• The cost to design a 28nm planar device ranges from $10 million to
$35 million (Gartner).
• The cost to design a 7nm system-on-a-chip (SoC) ranges from $120
million to $420 million (Gartner).
• 5nm is a completely new process with updated EDA tools and IP. The
cost to design a 5nm device ranges from $210 million to $680 million
(Gartner).
$20Bper fab run at 3nm (IBS)
• The NRE costs of RnD optical proximity correction, multi-patterning, and
extreme ultraviolet (EUV). https://www.eejournal.com/article/no-more-nanometers/
• After 5nm, the next full node is 3nm. But 3nm is not for the faint of heart.
• The cost to design a 3nm device ranges from $500 million to $1.5 billion,
according to IBS.
• Process development costs ranges from $4 billion to $5 billion, while a fab
runs $15 billion to $20 billion, according to IBS.
• “Transistor costs at 3nm are expected to be 20% to 25% higher than at
5nm based on same level of maturity,” IBS’ Jones said. “Expect 15% more
performance and with 25% less power consumption compared to 5nm
finFETs.” https://semiengineering.com/5nm-vs-3nm/
18
1. Computer performance was driven by clock speeds.
2. Facing several physical walls, clock speed parallelism. (Amdahl’s
law - parallelism will soon be limited for nonscientific computations).
3. An unavoidable path towards specialization devices (such as ASICs,
DPUs, TPUs, IPUs, ...), Thus, computer performance will probably need to
seek another driving factor.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 19
Computing hierarchy of needs:
Architectural needs
A bit of motivation
What’s Next in computing &
the role of cloud FPGAs
Horizontal Needs: Architectural
Vertical
Needs
. . .
Physical Needs
. . .
Technological Needs
(μ)-architecture
Edge-to-cloud, HPC Storage
Gordon Moore’s law and its derivatives; T: Transistor total, Klauer, Bernd. “The Convey Hybrid-Core Architecture.” (2013).
1
2
3
Latency, Cost, …,
Energy efficiency
. . .
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 20
Computing hierarchy of needs:
Software needs
A bit of motivation
What’s Next in computing &
the role of cloud FPGAs
Horizontal Needs: Software
Vertical
Needs
. . .
Physical Needs
. . .
Technological Needs
. . .
Languages Libraries Virtualization
Architectural Needs
. . .
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 21
Computing hierarchy of needs: Cloud needs
A bit of motivation
What’s Next in computing &
the role of cloud FPGAs
Horizontal Needs: Cloud
Vertical
Needs
. . .
Physical Needs
. . .
Technological Needs
Architectural Needs
Software Needs
PaaS, IaaS, …, FaaS 5G/6G Edge, IoT, V2X
. . .
https://amulya-bhatia.medium.com/iaas-vs-caas-vs-paas-vs-faas-vs-saas-whats-the-difference-ee84ecc2d519
. . .
Comparison of the 4G (IMT-Advanced) and
5G (IMT-2020) specifications. Source: ETSI
General AI
Revolutionary
True neuro-AI
Cross-domain learning
and reasoning
Broad autonomy with
moral reasoning
Wetware?
Transcendence ?
(personal view)
Broad AI
Disruptive and Pervasive
Neuro-symbolic AI
Multi-task, multi-domain, multi-
modal
Trusted AI capable of learning with
much less data
Reduced Precision and Analog HW
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 22
Computing hierarchy of needs:
AI needs
A bit of motivation
What’s Next in computing &
the role of cloud FPGAs
. . .
Physical Needs
. . .
Technological Needs
Architectural Needs
Software Needs
“Transcendence refers to the very highest and most inclusive or
holistic levels of human consciousness, behaving and relating, as
ends rather than means, to oneself, to significant others, to human
beings in general, to other species, to nature, and to the cosmos”
Maslow 1971, p. 269
AI
Narrow AI
Emerging
Deep Learning
Single-task, single-domain, with
superhuman accuracy
Requires large-amounts of labeled
data
CPU & GPU
We are here now
Cloud Needs
The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary
. . .
. . .
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 23
My view on motivation for human-centric trusted AI
A bit of motivation
What’s Next in computing &
the role of cloud FPGAs
. . .
Physical Needs
. . .
Technological Needs
Architectural Needs
Software Needs
AI
Cloud Needs
. . .
. . .
Physiological Needs
Safety Needs
Social Needs
Esteem Needs
Self-actualization Needs
Transcendence
Augment human2human & human2cosmos consciousness
Computing hierarchy of needs Maslow's hierarchy of needs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 24
https://aif360.mybluemix.net/
https://www.ibm.com/blogs/research/2018/04/ai-adversarial-robustness-toolbox/
http://aix360-dev.mybluemix.net/?_ga=2.230889183.1995265854.1610364654-99329142.1609856291
https://www.research.ibm.com/artificial-intelligence/trusted-ai/
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 25
Bits
Mathematics + Information
Today’s Computers and
Supercomputers
Neurons
Biology + Information
Today’s AI Systems
Qubits
Physics + Information
Today’s Quantum Systems
The Future of Computing: Bits + Neurons + Qubits, Dario Gil and
William M. J. Green, ISSCC2020: Plenary
How we
get there
What’s Next in computing &
the role of cloud FPGAs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 26
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
“In 1945, while consulting for the Moore School
of Electrical Engineering on the EDVAC project,
von Neumann wrote an incomplete set of notes,
titled the First Draft of a Report on the EDVAC.
This widely distributed paper laid foundations of a
computer architecture in which the data and the
program are both stored in the computer’s
memory in the same address space, which will be
described later as von Neumann Architecture
(drawing at right). This architecture became the
de facto standard for a long time and is still used
today (until technology enabled more advanced
architectures).”
https://history-computer.com/john-von-neumann-biography-history-and-inventions/
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 27
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Compute-bound
Memory-bound
Optimal operation point
(Bandwidth and CPU are not under-utilized)
28
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Compute-bound
Memory-bound
Future CPU Computation Roof
Amdahl’s Law &
Dark Silicon: The
future is not 1000s of
conventional cores
“Amdahl’s Law of specialization”
is it better to speedup 1% of
apps by 100×
or
allapps by 1% ?
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 29
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Compute-bound
Memory-bound
System specialization
using accelerators:
Architectures designed
with a specific class of
computations in mind
Accelerator
Memory
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 30
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Arithmetic intensity of A (depends only on application’s characteristics)
Application A
(specifications)
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 31
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Arithmetic intensity of A
Computing performance “needs” for Application A
Application A
needs (perf. Pow)
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 32
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Baseline performance for A
Arithmetic intensity of A
Coding App. A
(C,C++,Java,Python)
Computing performance “needs” for Application A
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 33
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Compiler
Optimizations
(gcc -03 …)
A w/ comp. opt.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 34
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Multi-core
(pThreads,
openMP…)
A w/ comp. opt.
A w/ multi-core
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 35
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
SIMD
(SSE, AVX, …)
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 36
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
DVFS
(freq. boost, …)
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 37
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Application A
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Manual Code
optimization
(profiling and fun…)
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
A w/ manual code opt.
Accelerator I
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 38
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Buy accelerator I
(GPU, TPU, ASIC…)
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
A w/ manual code opt.
Memory
Application A
A w/ accelerator I Assuming CPU-Acc.
BW is sufficient !!!
Accelerator I
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 39
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Use vendor libs of
accelerator I
(cuBLAS, cuDNN, …)
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
A w/ manual code opt.
Memory
Application A
A w/ accelerator I (HW)
A w/ accelerator I (HW+SW)
Assuming CPU-Acc.
BW is sufficient !!!
Accelerator I
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 40
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
?
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
A w/ manual code opt.
Memory
Application A
A w/ accelerator I (HW)
A w/ accelerator I (HW+SW)
Arithmetic intensity of B
Application B
Computing performance “needs” for Application B
Accelerator II
Accelerator II
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 41
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Buy Acc. II with
better memory
(HBM2, …)
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
A w/ manual code opt.
Memory
Application A
A w/ accelerator I (HW)
A w/ accelerator I (HW+SW)
Arithmetic intensity of B
Application B
Computing performance “needs” for Application B
Accelerator II
Accelerator II
Accelerator II
Accelerator II
Accelerator II
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 42
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Buy Accelerators
III, IV, V …
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
A w/ manual code opt.
Memory
Application A
A w/ accelerator I (HW)
A w/ accelerator I (HW+SW)
Arithmetic intensity of B
Application B
Computing performance “needs” for Application B
Application C
Application D
Application E
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 43
A bit of motivation
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Baseline performance for A
Arithmetic intensity of A
Computing performance “needs” for Application A
Buy FPGA
o custom logic,
o custom memory,
o custom interconnects
A w/ comp. opt.
A w/ multi-core
A w/ SIMD
A w/ DVFS
A w/ manual code opt.
A w/ accelerator I (HW)
A w/ accelerator I (HW+SW)
Arithmetic intensity of B
Computing performance “needs” for Application B
FPGA
Application A
Application B
Application C
Application D
Application E
Not only custom,
but also
reconfigurable at
seconds’ speed !
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 44
✓ Reconfigurable
logic
✓ Reconfigurable
memory
✓ Reconfigurable
interconnects
ASICs
A GPU is effective at processing the same set of operations in parallel – single instruction, multiple data (SIMD).
A GPU has a well-defined instruction-set, and fixed word sizes – for example single, double, or half-precision
integer and floating point values.
▪ An FPGA is effective at processing the same or different operations in parallel – multiple instructions, multiple data (MIMD).
An FPGA does not have a predefined instruction-set, or a fixed data width.
Figures source: AWS - Announcing Amazon EC2 F1 Instances with Custom FPGAs, Bringing Hardware Acceleration closer to the programmer, Ecoscale-ExaNest workshop, 2017
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
Silicon alternatives for rapid enterprise-ready specialization
44
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
ASICs vs
GPUs vs
TPUs vs
DPUs vs
FPGAs vs
Apples vs …
45
WP492 (v1.0.1) June 13, 2017, Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems
Why FPGAs ?
GPUs
FPGAs
How the comparable raw
performance between FPGAs
and GPUs bring growth ?
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 46
TPUs vs
GPUs vs
FPS/TOPS
FPGAs vs
GPUs vs
ASICs vs
After Google announced the scaling capabilities of TPUv4 [1], Nvidia adopted the "per-chip" metric
for A100 [2]. So what is the “right” granularity? Who defines that ? (No-one!). MLperf has a diverse set
of benchmarks which unveil various system bottlenecks [3], but favors FLOPS, i.e. a game where
FPGAs have not the strongest point.There are a zillion companies out there doing inference [4], and
they make a lot of claims, but who is going to have the biggest ROI for improving results ?
[1] https://cloud.google.com/blog/products/ai-machine-learning/google-breaks-ai-performance-records-in-mlperf-with-worlds-
fastest-training-supercomputer
[2] https://www.eetimes.com/nvidia-google-both-claim-mlperf-training-crown/
[3] https://ieeexplore.ieee.org/document/9238612
[4] https://basicmi.github.io/AI-Chip/
47
“If you’re not sure of the optimal algorithms for
say compression or encryption for the data you’re
processing, or the data shape is going to be
changing over time so you don’t want to take the
risk of burning it to the silicon, you can
experiment and be agile on FPGAs”
Azure Chief Technology Officer, Mark Russinovic
Microsoft’s cloud strategy
favors FPGAs
The impact of FPGAs on query latency for Bing; even at double the query load FPGA-accelerated ranking has
lower latency than software-powered ranking at any load.
A. M. Caulfield et al., "A cloud-scaleacceleration architecture," 2016 49th Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO), Taipei, 2016, pp. 1-13, doi: 10.1109/MICRO.2016.7783710.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
… from research wise vision in 2015 …
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 48
(Microsoft’s) Doug Burger's talk at FPL, Sep. 1st 2020
… to industry adoption in 2020!
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 49
Live Video Transcoding Launch, Aaron Behman, Director of Video Product Marketing, Data Center Group
https://www.xilinx.com/publications/presentations/video-transcoding-media-deck.pdf
Definition: Capital expenditures
(CapEx) are funds used by a
company to acquire, upgrade, and
maintain physical assets such as
property, plants, buildings,
technology, or equipment. CapEx is
often used to undertake new
projects or investments by a
company. Capital Expenditure (CapEx)
Definition - Investopedia
www.investopedia.com
Hype source: Gartner,2020, https://www.gartner.com/en/documents/3988006/hype-cycle-for-artificial-intelligence-2020
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
What’s Next in
FPGAs for
cloud is
agility
How we envision to
increase FPGAs agility?
Many enterprises experience a
steady decline in their ability to
coordinate & operationalize FPGA
projects — particularly at scale.
50
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 51
1.Infra 2.Software
3.Automation 4.Composability
4. Composability
3. Automation
2. Software
1. Infra
Our visionary journey of FPGAs for cloud
51
What’s Next in
FPGAs for
cloud is
agility
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 52
1.Infra 2.Software
3.Automation 4.Composability
1. Infra
Our visionary journey of FPGAs for cloud
52
What’s Next in
FPGAs for
cloud is
agility
Accelerator I
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 53
What’s Next in FPGAs for
cloud is agility
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Arithmetic intensity of A
Computing performance “needs” for Application A
Use FPGAs with
high CPU-FPGA
interconnect BW
Memory
Application A
1 A w/ accelerator I (HW+SW)
Assuming CPU-Acc.
BW is sufficient !!!
2
Performance drops
due to inefficient
CPU-Accelerator BW
54
cloudFPGA and OpenCAPI-FPGAs in a nutshell
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
FPGA as a Co-Processor
POWER9 AC922 + V100 + 9V3/9H7
Up to 8 OpenCAPI FPGAs per 2U chassis.
Weather modeling
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 55
Research in OpenCAPI-attached FPGAs
What’s Next in FPGAs for cloud
is agility
Kaan Kara, Christoph Hagleitner, Dionysios Diamantopoulos, Dimitris Syrivelis,
Gustavo Alonso: High Bandwidth Memory on FPGAs: A Data Analytics
Perspective. FPL 2020
In-Memory Data Analytics
http://www.cosmo-model.org
Image source: payodsoft.com
Genomics
o Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner,
Sander Stuijk, Henk Corporaal: NARMADA: Near-Memory Horizontal
Diffusion Accelerator for Scalable Stencil Computations. FPL 2019
o Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan
Gómez-Luna, Sander Stuijk, Onur Mutlu, Henk Corporaal: NERO: A Near
High-Bandwidth Memory Stencil Accelerator for Weather Prediction
Modeling. FPL 2020
Abbas Haghi, Lluc Alvarez, Jordà Polo, Dionysios
Diamantopoulos, Christoph Hagleitner, Miquel
Moretó: A Hardware/Software Co-Design of K-mer
Counting Using a CAPI-Enabled FPGA. FPL 2020:
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 56
What’s Next in FPGAs for
cloud is agility
Attainable performance: G(FL)OPS
Computation to communication ratio: (FL)OP/Byte
CPU Computation Roof CPU
Memory
Buy FPGA
o custom logic,
o custom memory,
o custom interconnects
FPGA
Application A
Application B
Application C
Application D
Application L
I have 12
applications, but
in one 2u-node I
can only attach up
to 8x FPGAs
Well, multi-tenancy is also
available in FPGAs, but the
scaling problem remains
in the long-term…
Not only custom,
but also
reconfigurable at
seconds’ speed !
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 57
What’s Next in FPGAs for cloud
is agility
Computation to communication ratio: (FL)OP/Byte
Use cloudFPGA
o End CPU slavery!
o Deploy FPGAs at large
scale in hyperscale DCs
FPGA
Application A
Application B
Application C
Application D
Application E
~1000 FPGAs / rack
FPGA
Excuse my artistic and
simplifying vision that
ignores :
o the bindings of an
application to SW
libraries,
o the runtime options,
o the scheduler policies,
o the resource-manager
policies,
o the control-plane and
data-plane management,
o the security
o and many-many more …
THINK BIG!
Attainable performance: G(FL)OPS
58
cloudFPGA and OpenCAPI-FPGAs in a nutshell
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
FPGA as a Co-Processor FPGA as a Peer-Processor
POWER9 AC922 + V100 + 9V3/9H7 64 FPGAs into one 19"×2U chassis (64-port 10GbE =640Gb/s BW).
In all, 16 such chassis fit into a 42U rack for a total of 1024 FPGAs
and 16 TB of DRAM.
Up to 8 OpenCAPI FPGAs per 2U chassis.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 59
cloudFPGA and OpenCAPI-FPGAs in a nutshell
FPGA as a Co-Processor FPGA as a Peer-Processor
https://github.com/...<STAY_TUNED>
https://github.com/OpenCAPI/oc-accel
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
cloudFPGA
concept
60
Highlights
• dense
→ chassis w/ 64 compute units
→ ~1000 FPGAs / rack
• integration of 1st level switch
→ full cross-sectional BW
→ low cost (cables / rack space)
• energy efficient
→ no SW/FW overhead
→ no CPU overhead
→ (hot) water cooling
• self-hosted / network-attached
→ bare-metal support
→ scalabl
IP Address: 10.10.1.9
DRAM: 8GB, BRAM: 38MB
CLBs:660.000,
DSPs: 2760
IP Address: 10.10.1.50
DRAM: 32GB, Cores: 4
The FPGA becomes the node !
Goal → Deploy FPGAs at large scale in hyperscale DCs
1-10s of thousands per DC
What’s Next in FPGAs for cloud
is agility
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 61
Standalone network-attached FPGA
1.Replace PCIe I/F with
integrated NIC (iNIC).
2.Turn FPGA card into a self-
contained appliance.
3.Replace transceivers w/
backplane connectivity.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 62
One carrier sled = 32 FPGA modules
1. Our first FPGA module uses a Xilinx Kintex Ultrascale KU060
o A mid-range FPGA with high performance/price and low wattage
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 63
One carrier sled = 32 FPGA modules
1. Our first FPGA module uses a Xilinx Kintex Ultrascale KU060
o A mid-range FPGA with high performance/price and low wattage
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 64
Two carrier sleds per chassis = 64 FPGAs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 65
Sixteen chassis per rack = 1024 FPGAs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 66
cloudFPGA
67
Compute density - S822LC (aka Minsky) vs FPGA chassis
~x2 INT8 TOPS
~x4 INT4 TOPS
~x8 INT2 TOPS
~x16 Bin TOPS
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 68
1.Infra 2.Software
3.Automation 4.Composability
Our visionary journey of FPGAs for cloud
68
Core
themes
What’s Next in FPGAs for cloud
is agility
2. Software
1. Infra
Who we built this for
69
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
cF developer #1
cF developer #2
cF developer #3
cF (Nirvana) developer #4
Tools
❑ cloudFPGA Studio
✓ Host: cFPy, Jupyter Lab
✓ Kernel: VHDL, Verilog, C,
C++, SystemC, OpenCL
❑ cFDK
✓ Host: ZRLMPI, cFPy,
OpenROLE (SW)
✓ Kernel: VHDL, Verilog, C, C++,
SystemC, OpenCL with
OpenROLE (HW)
❑ cFDK
✓ Host: Custom API with
TCP/UDP
✓ Kernel: VHDL, Verilog, C,
C++, SystemC, OpenCL with
AXI I/F
❑ User’s front-end application
integrates with cloud-native cF
software that leverages cF
nodes transparently.
✓ Host: gRPC, RESTapi, ...
✓ Kernel: VHDL, Verilog, C, C++,
SystemC, OpenCL
“I wasn’t aware the service I am
using involved cloudFPGA.”
“I need to accelerate an
application. I don’t know RTL/HLS
and hardware design.”
“I want to create or reuse my RTL/HLS
designs while designing HW and SW
middleware.”
“I want to create or reuse RTL/HLS
kernels while using standard APIs
whenever possible.”
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
Software action #1
cFDK REST API
70
cF developer #1
“I want to create or reuse my RTL/HLS
designs while designing HW and SW
middleware.”
Disclaimer: Hardware in FPGA world can be software too! ☺
e.g. “FPGAs for Software Programmers”, Dirk Koch, Frank Hannig, and Daniel Ziener. 2016. Springer Publishing Company, Incorporated.
DONE
What’s Next in FPGAs for cloud
is agility
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
Software action #2
cFDK REST API
71
cF developer #1
cF developer #2
“I want to create or reuse my RTL/HLS
designs while designing HW and SW
middleware.”
“I want to create or reuse RTL/HLS
kernels while using standard APIs
whenever possible.”
On
track
FCCM2020 Workshop: THE FUTURE OF FPGA-ACCELERATION IN CLOUD AND DATA CENTERS
openRole: Do we need a POSIX for FPGAs? Burkhard Ringlein, IBM Research Europe
http://www.fccm.org/proceedings/2020/Workshops/Future_of_FPGA_Workshop/2020-04-29_openRole_workshop_public-Burkhard%20Ringlein.pdf
FPL2020 Workshop: DevOps support for Cloud FPGA platforms
openRole: Can we bring ‘Design once, run everywhere’ to FPGAs? Burkhard Ringlein, IBM Research Europe
What’s Next in FPGAs for cloud
is agility
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
Software action #3
Quantitative
Finance
Kernel
Weather
Modeling Kernel
Computer
Vision
Kernel
Database
Acceleration
Kernel
DSP
Kernel
Data Security
Kernel
Linear Algebra
Kernel
AI Inference Kernel
Domain-specific
Languages
Accelerated
Libraries
Custom
Accelerators
Data
Analytics
Kernel
cFDK REST API
Abstraction
levels
72
cF developer #1
cF developer #2
“I want to create or reuse my RTL/HLS
designs while designing HW and SW
middleware.”
“I want to create or reuse RTL/HLS
kernels while using standard APIs
whenever possible.”
cF developer #3
“I need to accelerate an application. I don’t
know RTL/HLS and hardware design.”
on
track
What’s Next in FPGAs for cloud
is agility
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 73
Software 1.0 Software 2.0
ML
The road to Software 2.0, M. Loukides and B. Lorica, O’Reilly, December 10, 2019
https://www.oreilly.com/radar/the-road-to-software-2-0/
What’s Next in FPGAs for cloud
is agility
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 74
Deploy Deep Learning Everywhere: Limitations
New operator
introduced by
operator fusion
optimization
potential benefit
TVM For Fun and Profit Tutorial, at FCRC 2019
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 75
Build intelligent systems with learning (offline and online)
TVM For Fun and Profit Tutorial, at FCRC 2019
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 76
End-to-end compilation flow for transprecision FPGAs
Agile Autotuning of a Transprecision Tensor
Accelerator Overlay D.Diamantopoulos et al., FPL2020
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 77
1.Infra 2.Software
3.Automation 4.Composability
3. Automation
2. Software
1. Infra
Our visionary journey of FPGAs for cloud
77
Core
themes
What’s Next in FPGAs for cloud
is agility
Who we built this for
78
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
I can run my containerized application
without having to worry about sizing,
creating or managing a cluster. “Run my
container” vs. “Give me a cluster, that I
can then run my container on”.
Container-Savvy Developer
Functions Developer
I love Functions-as-a-Service and can
now run them with almost no limits. I
now have a single platform to securely
combine Functions with Apps and
other containerized workloads.
cF developer #1
cF developer #2
cF developer #3
cF (Nirvana) developer #4
“I wasn’t aware the service I am
using involved cloudFPGA.”
“I need to accelerate an
application. I don’t know RTL/HLS
and hardware design.”
“I want to create or reuse my RTL/HLS
designs while designing HW and SW
middleware.”
“I want to create or reuse RTL/HLS
kernels while using standard APIs
whenever possible.”
PaaS/IaaS Developer
I can start utilizing a new powerful
platform/infra and:
- keep using a “push source code”
experience
- do not have to worry about
containers
- can easily connect my code to
backing services
Tools
❑ cloudFPGA Studio
✓ Host: cFPy, Jupyter Lab
✓ Kernel: VHDL, Verilog, C,
C++, SystemC, OpenCL
❑ cFDK
✓ Host: ZRLMPI, cFPy,
OpenROLE (SW)
✓ Kernel: VHDL, Verilog, C, C++,
SystemC, OpenCL with
OpenROLE (HW)
❑ cFDK
✓ Host: Custom API with
TCP/UDP
✓ Kernel: VHDL, Verilog, C,
C++, SystemC, OpenCL with
AXI I/F
❑ User’s front-end application
integrates with cloud-native cF
software that leverages cF
nodes transparently.
✓ Host: gRPC, RESTapi, ...
✓ Kernel: VHDL, Verilog, C, C++,
SystemC, OpenCL
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 79
Software-defined multi-FPGA fabric
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 80
1.Infra 2.Software
3.Automation 4.Composability
4. Composability
3. Automation
2. Software
1. Infra
Our visionary journey of FPGAs for cloud
80
Core
themes
What’s Next in FPGAs for cloud
is agility
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 81
▪ Thousands of tiny CPUs using high
parallelization
▪ compute intensive application
▪ SIMD-oriented workloads
▪ Logic + IOs are customized
▪ Very low and predictable latency
▪ MIMD-oriented workloads
New AI HW
▪ 64 FPGAs into one 19"×2U chassis (64-port
10GbE =640Gb/s BW).
▪ In all, 16 such chassis fit into a 42U rack for a
total of 1024 FPGAs and 16 TB of DRAM.
cloudFPGA
https://www.zurich.ibm.com/cci/cloudFPGA
Hybrid cloud
GPU
FPGA
Composable systems with FPGAs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 82
HelmGemm: AI HW fractionalization
➢ Docker container service: multi-tenant
environment with a high-level API to
provide lightweight containers that run
processes in isolation
➢ Kubernetes management: deploy,
maintain, and scale applications
➢ HelmGemm extension: hardware,
middleware and software
➢ Hardware support : 4xGPUs, 2xFPGAs
HelmGemm: Managing GPUs and FPGAs for Transprecision GEMM Workloads in
Containerized Environments
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 83
HelmGEMM overview
System Memory GPU Memory FPGA Memory Accelerators’ view of memory
CPU P9 GPU V100 FPGA 9V3
NVLink
3bricks x 50GBps
CAPI2
(32GBps)
OpenCAPI
(<50GBps)
120GBps/
socket
(Open)CAPI
➢ (Open)CAPI technology enables an HPC node with unified memory for accelerators
CPU P9 GPU V100 FPGA 9V3
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 84
HelmGEMM case study of Yolov3
Mapped to GPU V100 half precision: 140W
Mapped to FPGA 9V3 4-13bits: 32W
On CPU single precision
Mapped to GPU V100 half precision: 140W
On CPU single precision
➢ Heterogenous execution of Yolov3 CNN on P9+V100+AD9V3 for energy efficiency
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 85
HelmGEMM evaluation
28.7x more energy efficiency
59.3x more performance
$21,878 $30,587
83 G(fl)Ops/sec 4.94 T(fl)Ops/sec
POWER9 AC922
POWER9 AC922 + V100 + 9V3
0.16 G(fl)Ops/sec/Watt 4.78 G(fl)Ops/sec/Watt
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 86
Composable systems with FPGAs
▪ Thousands of tiny CPUs using high parallelization
▪ compute intensive application
▪ SIMD-oriented workloads
GPU FPGA
▪ Logic + IOs are customized exactly for the
application's needs.
▪ Very low and predictable latency applications
▪ MIMD-oriented workloads
New AI HW
Byte-addressable
Byte-addressable
External: Byte-addressable
Internal : >Bit-addressable
fp16, fp8, int4 int2
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 87
Communication for low-precision AI HW
PHRYCTORIA motivation:
Traditional communication mechanisms
for modern low-precision data-types
(e.g. brain-float16, int5) cannot exploit
the bandwidth of emerging
communication links for FPGA
accelerators (e.g. OpenCAPI, PCIe4, etc).
Heterogeneous System: IBM1 IC922, 2POWER91 CPUs,
AlphaData ADM-9H7 (Xilinx VU37P FPGA), OpenCAPI 3.0 25Gbps8.
-2x BW utilization for brain-float16
-6x BW utilization for int5
Name inspired after the ancient Greek
communication system “ΦΡΥΚΤΩΡΙΑ”, 1900 B.C.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 88
Bits
Mathematics + Information
Today’s Computers and
Supercomputers
Neurons
Biology + Information
Today’s AI Systems
Qubits
Physics + Information
Today’s Quantum Systems
The Future of Computing: Bits + Neurons + Qubits, Dario Gil and
William M. J. Green, ISSCC2020: Plenary
How we
get there
What’s Next in computing &
the role of cloud FPGAs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 89
https://analog-ai-demo.mybluemix.net/
o The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary
o E. Eleftheriou et al., "Deep learning acceleration based on in-memory computing," in IBM Journal of Research
and Development, vol. 63, no. 6, pp. 7:1-7:16, 1 Nov.-Dec. 2019
o https://www.research.ibm.com/artificial-intelligence/ai-hardware-center/
https://www.ibm.com/blogs/research/2019/02/ai-hardware-center/ , Feb. 2019
o Extending performance by 2.5X / year through 2025
o Approximate computing principles applied to Digital AI Cores with
reduced precision and Analog AI cores (Non von Neumann HW)
o PCM devices have the ability to store synaptic weights in their analog
conductance state. When PCM devices are arranged in a crossbar
configuration, it allows to perform an analog matrix-vector
multiplication in a single time step, exploiting the advantages of
multi-level storage capability and Kirchhoff’s circuits laws.
https://www.technologyreview.com/2020/12/11/1014102/ai-trains-on-4-bit-computers/
What’s Next in computing &
the role of cloud FPGAs
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 90
Bits
Mathematics + Information
Today’s Computers and
Supercomputers
Neurons
Biology + Information
Today’s AI Systems
Qubits
Physics + Information
Today’s Quantum Systems
The Future of Computing: Bits + Neurons + Qubits, Dario Gil and
William M. J. Green, ISSCC2020: Plenary
How we
get there
What’s Next in computing
& the role of cloud FPGAs
https://qiskit.org/ https://www.ibm.com/quantum-computing/experience/
At CES 2020, IBM research director Dario Gil gave the audience a primer on quantum computing and predicted that the industry will achieve quantum
advantage this decade.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 92
Bits
Qubits
Neurons Bits
Qubits
Neurons Bits
Qubits
Neurons Bits
Qubits
Neurons
Accelerated discovery
Bits + Neurons + Qubits
Deep Search Intelligent Simulation Generative Models Autonomous Labs
Bits + Neurons Bits + Qubits Bits + Neurons Bits + Neurons
The Future of Computing: Bits + Neurons + Qubits, Dario Gil and
William M. J. Green, ISSCC2020: Plenary
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 93
How we get there
What’s Next in computing &
the role of cloud FPGAs
The role of cloud FPGAs
1. FPGAs are eligible to become 1st class citizens
➢ Standalone approach sets the FPGA free from the CPU
o Large scale deployment of FPGAs independent of #servers
o Significantly lowers the entry barrier
➢ Promotes the use of medium and low-cost FPGAs
2. The network-attachment model
➢ Makes FPGAs IP-addressable and scalable in DCs
o Users can rent and link them in any type of topology
➢ Opens the path for use of FPGAs in large scale applications
o Serverless computing, HPC, DNN inference, Signal Processing, …
3. The hyperscale infrastructure
➢ Integrates FPGAs at the chassis (aka drawer) level
➢ Combines passive and active water cooling
➢ Key enabler for FPGAs to become plentiful in DCs
The Future of Computing: Bits + Neurons + Qubits, Dario Gil and
William M. J. Green, ISSCC2020: Plenary
FCCM2020 Workshop: THE FUTURE OF FPGA-ACCELERATION IN CLOUD AND DATA CENTERS
cloudFPGA: Promote FPGAs to 1st Citizen in the Cloud, Francois Abel, IBM Research Europe
http://www.fccm.org/proceedings/2020/Workshops/Future_of_FPGA_Workshop/cloudFPGA.pdf
Research Ecosystem
Team Collaboration
Burkhard Ringlein, Francois Abel, Beat Weiss, Mitra Purandare,
Florian Auernhammer (OCAPI), Raphael Polig (ZYC2),
Christoph Hagleitner, Mark Lantz
ZRL Collaboration
Florian Scheidegger, Cristiano Malossi (H2020 OPRECOMP)
Eindhoven University of Technology
Gagandeep Singh, Sander Stuijk, Henk Corporaal
Former NeMeCo ZRL colleagues: Jan van Lunteren, Ronald Luijten
• COOLCHIPS2020, WiP: Automated precision-tuning methods
for deep learning models on FPGAs and IoT devices.
• ISCAS2018, COOLCHIPS2018, FPT2018, ASAP2019,
SAMOS2019, DATE2019, RAW-IPDPS2020, FPL2019,
FPL2020, FCCM2020, H2RC2020
• FPL2019, SAMOS2019, DATE2019, FPL2020. NeMeCo H2020
project, near-memory accelerators for weather modeling
ETH
Stefan Mach, Fabian Schuiki, Germain Haugou,
Michael Schaffner, Frank K. Gurkaynak, and Luca Benini
• COOLCHIPS2020, Transprecision PULP on IBM-ZRL Cloud
ETH
Kaan Kara (now Oracle), Dimitris Syrivelis (former IBM-
Ireland colleague, now Nvidia) Gustavo Alonso
• FPL2020, In-memory database acceleration (FPGA +
OpenCAPI + HBM)
Barcelona Supercomputing Center
Abbas Haghi, Lluc Alvarez, Santiago Marco, Miquel Moreto
• FPL2020, Genomics acceleration with CAPI2-FPGA
IBM France – IBM China
Bruno Mesnet, Alexandre Castellane, Yong Lu (now cyansemi)
• SNAP(CAPI1/2) and OC-Accel(OpenCAPI), OpenPOWER
Summit 2018, 2019
There is no one-man-show
ETH
Gagandeep Singh, Juan Gómez-Luna, Onur Mutlu
• FPGA2021, Optimizing Near-memory accelerators with ML
94
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
What’s Next in FPGAs for cloud is agility
Infrastructure | Software | Automation | Composability
95
ΣΚΕΨΟΥ !
THINK
BIG !
IBM Research / Inventing What's Next / © 2020 IBM Corporation 96
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 97
Transprecision Computing DL: Constrained model synthesis for IoT applications
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 98
IoT Budget & Requirements
Inference
time
Memory
size
Device type
Dataset
Images, Sensors, Audio
Visual inspection
OUTPUT
Anomaly detection
USER INPUT
AUTOMATED ML MODEL SYNTHESIS FOR GIVEN EDGE DEVICE
CASE #1 CASE #2 CASE #3
…
FPGA ?
Sood, A. et al. “NeuNetS: An Automated Synthesis Engine for Neural Network Design.” ArXiv abs/1901.06261 (2019)
F. Scheidegger et al. “Constrained deep neural network architecture search for IoT devices accountingfor hardware calibration”, NeurIPS2019
Call for EU-funded PhD: Deep Learning Algorithms for Budget Constrained Applications in the IoT Domain
https://tuni.rekrytointi.com/paikat/?o=A_RJ&jgid=3&jid=794
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 99
PHRYCTORIA overview
Protocol Buffer
Serialization/Deserialization
Byte-addressable
enterprise system
Low-precision
FPGA accelerator
Synthetic FloatX dataset
NLP dataset
PHRYCTORIA: A Messaging System for Transprecision OpenCAPI-attached FPGA
Accelerators, D.Diamantopoulos et al., RAW-IPDPS2020
6.3x-7.4x
goodput BW
-4.8x MB
6.9x goodput BW
Compatible with any gRPC-
supported device/service
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 100
Survey and Benchmarking of Machine Learning Accelerators , 2019, MIT Lincoln Laboratory Supercomputing Centre
https://arxiv.org/abs/1908.11348
“Best” choice depends on requirements for
o Throughput (fps),
o Latency (ms),
o Energy efficiency (fps/watt),
o Cost efficiency (fps/$),
o Accuracy
What’s Next in FPGAs for cloud
is agility, operationalized
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 101
Autotuning of a Transprecision Tensor Accelerator
Agile Autotuning of a Transprecision Tensor
Accelerator Overlay D.Diamantopoulos et al., FPL2020
Instead of eliminating the hardware design space
with pruning, we propose a technique that builds
a prediction model which quantifies the impact
of a hardware design choice towards an
optimization goal.
By using the most important features in order to
generate an overlay we manage to perform auto-
tuning that succeeds in higher performance by
up to 2.5x and faster convergence by up to 8.1x.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 102
-53x
GPU memory sharing in containerized systems can lead to
GPU performance inefficiencies that fall within the performance
envelope of FPGAs, which operate on a power budget one
order of magnitude lower.
The case of AI HW with
GPUs & FPGAs
Is your Neural Network
Memory-bound or Compute bound
for your NEW AI HW?
How does it matter for the cloud ?
What “AI HW diverseness” means for an enterprise system?
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 103
Selected ML workloads
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 104
Aggressive bit-width optimization for every AI HW device
Simulations to establish
which parts of an application can
be mapped to lower precision
such
that their accuracy is not
degraded
• DeepSpeech and Language
Modeling (Euclidean distance
compared to fp32 for 100%
accuracy)
Distribution of the workloads to the lower-precision counterparts as a
code-coverage percentage.
Yolov3 : aggressive bit-width optimization so that classification accuracy on ImageNet is not less than
72.9% and 91.2% for top-1% and top-5%.
What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 105
HelmGEMM measurements

Mais conteúdo relacionado

Mais procurados

Be Vigilant: There Are Limits to Veillance
Be Vigilant: There Are Limits to VeillanceBe Vigilant: There Are Limits to Veillance
Be Vigilant: There Are Limits to VeillanceFoCAS Initiative
 
The Consequences of Living and Breathing with Hyperconnectedness
The Consequences of Living and Breathing with HyperconnectednessThe Consequences of Living and Breathing with Hyperconnectedness
The Consequences of Living and Breathing with HyperconnectednessFoCAS Initiative
 
Implementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The Skin
Implementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The SkinImplementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The Skin
Implementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The SkinFoCAS Initiative
 
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?Seismonaut
 
Twitter and research impact
Twitter and research impactTwitter and research impact
Twitter and research impactMarie Boran
 
AI Fables, Facts and Futures: Threat, Promise or Saviour
AI Fables, Facts and Futures: Threat, Promise or SaviourAI Fables, Facts and Futures: Threat, Promise or Saviour
AI Fables, Facts and Futures: Threat, Promise or SaviourUniversity of Hertfordshire
 
Iot opportunities-challenges
Iot opportunities-challengesIot opportunities-challenges
Iot opportunities-challengesjohnkbutcher
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionSymeon Papadopoulos
 
Itethic Reader V.1.1
Itethic Reader V.1.1Itethic Reader V.1.1
Itethic Reader V.1.1Paul Ward
 
On Internet of Everything and Personalization. Talk in INTEROP 2014
On Internet of Everything and Personalization. Talk in INTEROP 2014On Internet of Everything and Personalization. Talk in INTEROP 2014
On Internet of Everything and Personalization. Talk in INTEROP 2014Opher Etzion
 
SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)
SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)
SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)SSII
 
Ten Technology Trends That Will Change the World in Ten Years
Ten Technology Trends That Will Change the World in Ten YearsTen Technology Trends That Will Change the World in Ten Years
Ten Technology Trends That Will Change the World in Ten YearsCisco Services
 
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013TEST Huddle
 
Pal gov.tutorial6.session2. ethical and social issues
Pal gov.tutorial6.session2. ethical and social issuesPal gov.tutorial6.session2. ethical and social issues
Pal gov.tutorial6.session2. ethical and social issuesMustafa Jarrar
 
Konica Minolta - Artificial Intelligence White Paper
Konica Minolta - Artificial Intelligence White PaperKonica Minolta - Artificial Intelligence White Paper
Konica Minolta - Artificial Intelligence White PaperEyal Benedek
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everythingDavid Gerhard
 

Mais procurados (19)

Be Vigilant: There Are Limits to Veillance
Be Vigilant: There Are Limits to VeillanceBe Vigilant: There Are Limits to Veillance
Be Vigilant: There Are Limits to Veillance
 
The Consequences of Living and Breathing with Hyperconnectedness
The Consequences of Living and Breathing with HyperconnectednessThe Consequences of Living and Breathing with Hyperconnectedness
The Consequences of Living and Breathing with Hyperconnectedness
 
Implementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The Skin
Implementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The SkinImplementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The Skin
Implementing ‘Namebers’ Using Microchip Implants: The Black Box Beneath The Skin
 
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
Micah Allen: Zombies or Cyborgs: Is Facebook eating your brain?
 
Twitter and research impact
Twitter and research impactTwitter and research impact
Twitter and research impact
 
AI Fables, Facts and Futures: Threat, Promise or Saviour
AI Fables, Facts and Futures: Threat, Promise or SaviourAI Fables, Facts and Futures: Threat, Promise or Saviour
AI Fables, Facts and Futures: Threat, Promise or Saviour
 
Iot opportunities-challenges
Iot opportunities-challengesIot opportunities-challenges
Iot opportunities-challenges
 
IntoEv
IntoEvIntoEv
IntoEv
 
Deepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their DetectionDeepfakes: An Emerging Internet Threat and their Detection
Deepfakes: An Emerging Internet Threat and their Detection
 
Itethic Reader V.1.1
Itethic Reader V.1.1Itethic Reader V.1.1
Itethic Reader V.1.1
 
On Internet of Everything and Personalization. Talk in INTEROP 2014
On Internet of Everything and Personalization. Talk in INTEROP 2014On Internet of Everything and Personalization. Talk in INTEROP 2014
On Internet of Everything and Personalization. Talk in INTEROP 2014
 
SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)
SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)
SSII2021 [SS2] Deepfake Generation and Detection – An Overview (ディープフェイクの生成と検出)
 
Ten Technology Trends That Will Change the World in Ten Years
Ten Technology Trends That Will Change the World in Ten YearsTen Technology Trends That Will Change the World in Ten Years
Ten Technology Trends That Will Change the World in Ten Years
 
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
Harry Collins - Testing Machines as Social Prostheses - EuroSTAR 2013
 
Pal gov.tutorial6.session2. ethical and social issues
Pal gov.tutorial6.session2. ethical and social issuesPal gov.tutorial6.session2. ethical and social issues
Pal gov.tutorial6.session2. ethical and social issues
 
Konica Minolta - Artificial Intelligence White Paper
Konica Minolta - Artificial Intelligence White PaperKonica Minolta - Artificial Intelligence White Paper
Konica Minolta - Artificial Intelligence White Paper
 
Ethics in IT
Ethics in ITEthics in IT
Ethics in IT
 
Computing and the future of everything
Computing and the future of everythingComputing and the future of everything
Computing and the future of everything
 
Future Of Internet IV | AAAS
Future Of Internet IV | AAASFuture Of Internet IV | AAAS
Future Of Internet IV | AAAS
 

Semelhante a What’s Next in computing & the role of cloud FPGAs

Jdb code biology and ai final
Jdb code biology and ai finalJdb code biology and ai final
Jdb code biology and ai finalJoachim De Beule
 
Can we morally justify the replacement of humans by artificial intelligence i...
Can we morally justify the replacement of humans by artificial intelligence i...Can we morally justify the replacement of humans by artificial intelligence i...
Can we morally justify the replacement of humans by artificial intelligence i...Kai Bennink
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data SciencePhilip Bourne
 
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Amit Sheth
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedPhilip Bourne
 
Talks submitted
Talks submittedTalks submitted
Talks submittedKim Minh
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptxISSIP
 
SXSW: The Talks, Tech and Trends
SXSW: The Talks, Tech and TrendsSXSW: The Talks, Tech and Trends
SXSW: The Talks, Tech and TrendsIsobarUS
 
Spohrer SIRs 20230511 v16.pptx
Spohrer SIRs 20230511 v16.pptxSpohrer SIRs 20230511 v16.pptx
Spohrer SIRs 20230511 v16.pptxISSIP
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AlonePhilip Bourne
 
Norman Sadeh's Presentation
Norman Sadeh's PresentationNorman Sadeh's Presentation
Norman Sadeh's PresentationMediabistro
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global EcosystemPhilip Bourne
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Larry Smarr
 
We Do That Differently* Now
We Do That Differently* NowWe Do That Differently* Now
We Do That Differently* NowPeter Coffee
 
Privacy by Design Seminar - Jan 22, 2015
Privacy by Design Seminar - Jan 22, 2015Privacy by Design Seminar - Jan 22, 2015
Privacy by Design Seminar - Jan 22, 2015Dr. Ann Cavoukian
 
Citizen Science And a Manufacturing Revolution: Major trends research notes
Citizen Science And a Manufacturing Revolution: Major trends research notesCitizen Science And a Manufacturing Revolution: Major trends research notes
Citizen Science And a Manufacturing Revolution: Major trends research notesChris Jones
 

Semelhante a What’s Next in computing & the role of cloud FPGAs (20)

Conference Report Final 11.18
Conference Report Final 11.18Conference Report Final 11.18
Conference Report Final 11.18
 
Jdb code biology and ai final
Jdb code biology and ai finalJdb code biology and ai final
Jdb code biology and ai final
 
Can we morally justify the replacement of humans by artificial intelligence i...
Can we morally justify the replacement of humans by artificial intelligence i...Can we morally justify the replacement of humans by artificial intelligence i...
Can we morally justify the replacement of humans by artificial intelligence i...
 
AI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data ScienceAI from the Perspective of a School of Data Science
AI from the Perspective of a School of Data Science
 
Ai titech-virach-20191026
Ai titech-virach-20191026Ai titech-virach-20191026
Ai titech-virach-20191026
 
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
Smart Data for you and me: Personalized and Actionable Physical Cyber Social ...
 
Data Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has ChangedData Science and AI in Biomedicine: The World has Changed
Data Science and AI in Biomedicine: The World has Changed
 
From byte to mind
From byte to mindFrom byte to mind
From byte to mind
 
Talks submitted
Talks submittedTalks submitted
Talks submitted
 
Sins2016
Sins2016Sins2016
Sins2016
 
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx20240104 HICSS  Panel on AI and Legal Ethical 20240103 v7.pptx
20240104 HICSS Panel on AI and Legal Ethical 20240103 v7.pptx
 
SXSW: The Talks, Tech and Trends
SXSW: The Talks, Tech and TrendsSXSW: The Talks, Tech and Trends
SXSW: The Talks, Tech and Trends
 
Spohrer SIRs 20230511 v16.pptx
Spohrer SIRs 20230511 v16.pptxSpohrer SIRs 20230511 v16.pptx
Spohrer SIRs 20230511 v16.pptx
 
Biomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not AloneBiomedical Data Science: We Are Not Alone
Biomedical Data Science: We Are Not Alone
 
Norman Sadeh's Presentation
Norman Sadeh's PresentationNorman Sadeh's Presentation
Norman Sadeh's Presentation
 
Open Data in a Global Ecosystem
Open Data in a Global EcosystemOpen Data in a Global Ecosystem
Open Data in a Global Ecosystem
 
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021Advanced Cyberinfrastructure Enabled Services and Applications in 2021
Advanced Cyberinfrastructure Enabled Services and Applications in 2021
 
We Do That Differently* Now
We Do That Differently* NowWe Do That Differently* Now
We Do That Differently* Now
 
Privacy by Design Seminar - Jan 22, 2015
Privacy by Design Seminar - Jan 22, 2015Privacy by Design Seminar - Jan 22, 2015
Privacy by Design Seminar - Jan 22, 2015
 
Citizen Science And a Manufacturing Revolution: Major trends research notes
Citizen Science And a Manufacturing Revolution: Major trends research notesCitizen Science And a Manufacturing Revolution: Major trends research notes
Citizen Science And a Manufacturing Revolution: Major trends research notes
 

Último

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessWSO2
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Karmanjay Verma
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Nikki Chapple
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesManik S Magar
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 

Último (20)

Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Accelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with PlatformlessAccelerating Enterprise Software Engineering with Platformless
Accelerating Enterprise Software Engineering with Platformless
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#Microservices, Docker deploy and Microservices source code in C#
Microservices, Docker deploy and Microservices source code in C#
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
Microsoft 365 Copilot: How to boost your productivity with AI – Part two: Dat...
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotesMuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
MuleSoft Online Meetup Group - B2B Crash Course: Release SparkNotes
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 

What’s Next in computing & the role of cloud FPGAs

  • 1. What’s Next in computing & the role of cloud FPGAs Dr. Dionysios Diamantopoulos Research Staff Member, Cloud FPGAs & Tape Group, Cloud & AI Systems Research Department, IBM Research Europe Guest lecture at the Harokopio University of Athens, as part of the MSc program of the Informatics & Telematics Department, invited by Prof. Sotiris Xydis. 21 Jan. 2021
  • 2. IBM Legal Disclaimer This content was provided for informational purposes only. The opinions and insights discussed are those of the presenter and guests and do not necessarily represent those of the IBM Corporation. Nothing contained in these materials or the products discussed is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers, or altering the terms and conditions of any agreement you have with IBM. The information presented is not intended to imply that any actions taken by you will result in any specific result or benefit and should not be relied on in making a purchasing decision. IBM does not warrant that any systems, products or services are immune from, or will make your enterprise immune from, the malicious or illegal contact of any party. All product plans, directions and intent are subject to change or withdrawal without notice. References to IBM products, programs or services do not imply that they will be available in all countries in which IBM operates. IBM, the IBM logo, and other IBM products and services are trademarks of the International Business Machines Corporation, in the United States, other countries or both. Other company, product, or services names may be trademarks or services marks of others. For copyright and trademark information go to: http://www.ibm.com/legal/us/en/copytrade.shtml 2
  • 3. Beijing Tokyo Shin-Kawasaki Delhi Bangalore Singapore Nairobi Haifa Zurich Warrington Dublin Cambridge Albany Yorktown Almaden Rio de Janeiro Sao Paulo Johannesburg Melbourne 3000 Researchers 19 Locations 6 Continents 6 Nobel Laureates 10 Medals of Technology 5 National Medals of Science 6 Turing Awards 3
  • 4. A legacy of world-class research For 75 years, IBM Research has been propelling innovation for IBM, from the first programmable computers to the quantum computers of today. More than anything, our goal is to catalyze and drive the advancements that shape our world. With more than 3,000 researchers across the globe, we are anticipating, examining, and inventing What’s Next in science and technology every single day. 2019 IBM Project Debater 2018 Summit and Sierra: World’s Fastest Supercomputers 2017 Commercial Quantum Computing 2016 World’s first quantum computer on the cloud 2015 Watson Genomic Analytics for Personalized Cancer Treatment 2014 SyNAPSE: Biologically Inspired Neural Architecture 2013 Antimicrobial Polymers 2012 Atomic Imaging (Charge Distribution, Bond Order) 2011 Watson Wins Jeopardy! 2009 Nanoscale Magnetic Resonance Imaging (MRI) 2008 World’s First Petaflop Supercomputer 2007 Web-scale Mining 2005 Cell Broadband Engine 2004 Blue Gene/L 2003 5 Stage Carbon Nanotube Ring Oscillator 2000 Java Performance 1998 Silicon on Insulator (SOI) 1997 Copper Interconnect Wiring 1997 Deep Blue 1994 Silicon Germanium (SiGe) 1990 Chemically Amplified Photoresists 1987 High-Temperature Superconductivity (Nobel Prize) 1986 Scanning Tunneling Microscope (Nobel Prize) 1980 Reduced Instruction Set Computing (RISC) 1979 Thin Film Recording Heads 1973 Winchester Disk Drive 1971 Speech Recognition 1970 Relational Database 1967 Fractals 1966 One-Device Memory Cell 1957 FORTRAN 1956 Random Access Memory Accounting Machine (RAMAC) 4
  • 5. © 2020 IBM Corporation IBM Research Europe 5 Dublin Daresbury Hursley Zurich Daresbury Zurich 5
  • 6. IBM Research – Zurich Established in 1956 45+ different nationalities Open Collaboration: o Horizon2020: 50+ funded projects and 500+ partners Two Nobel Prizes: o 1986: Nobel Prize in Physics for the invention of the scanning tunneling microscope by Heinrich Rohrer and Gerd K. Binnig o 1987: Nobel Prize in Physics for the discovery of high-temperature superconductivity by K. Alex Müller and J. Georg Bednorz European Physical Society Historic Site Binnig and Rohrer Nanotechnology Centre (Public Private Partnership with ETH Zürich and EMPA) 7 European Research Council Grants My office 6
  • 7. # who am I v.0.1 1985 2009 Ph.D. @ ECE, NTUA. “Cross-Layer Rapid Prototyping and Synthesis of Application-Specific and Reconfigurable Many-accelerator Platforms” 2015 Military service IT Engineer @ Hellenic Army General Staff 2016 R&D Engineer, Startup, LN2 2016 2017 Postdoc Researcher, Heterogeneous Cognitive Computing Systems Group, Cloud & Computing Infrastructure Department, IBM Research – Zurich, “Transprecision Computing” PhD Researcher and R&D engineer in ESA, EU and national funded projects Postdoc Researcher, Cloud FPGAs and Tape Group, Cloud & AI Systems Research Department, IBM Research Europe, “Transprecision Computing”, “Near Memory Computing”, “cloudFPGA” 2019 2021 Research Scientist, Cloud FPGAs and Tape Group, Cloud & AI Systems Research Department, IBM Research Europe Not necessarily linear scale Time is relative (to your frame of reference) Childhood & school @ Pylos, Greece Met Prof. Sotirios Xydis Enjoying our collaboration and friendship thereafter D.Eng. @ CEID, Univ.Patras “Design and Implementation of a dual-processor (RISC) System-on- Chip targeting machine vision algorithms on FPGAs and eASICs” 7
  • 8. We’re Inventing What’s Next in: Hybrid Cloud AI Quantum Science 8 IBM’s innovation: Topping the US patent list for 28 years running https://www.ibm.com/blogs/research/2021/01/ibm-patent-leadership-2020/ From automated teller machine (ATM), speech recognition technology, DRAM to a novel way to search multilingual documents using NLP, 2300 AI patents ! What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
  • 9. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 9 Maslow's hierarchy of needs: Basic needs or physiology needs A bit of motivation “The basic need is a concept that was derived to explain and cultivate the foundation for motivation. This concept is the main physical requirement for human survival. This means that basic needs are universal human needs. Basic needs, being primal, are by default, a governor on the attainment of the "higher" needs. Efforts to accomplish higher needs may be interrupted temporarily by a deficit of primal needs, such as a lack of food or air. Basic needs are considered in internal motivation according to Maslow's hierarchy of needs. Maslow's idea is that humans are compelled to fulfill these basic needs first to pursue intrinsic satisfaction on a higher level.[3] If these needs are not achieved, it leads to an increase in displeasure within an individual. In return, when individuals feel this increase in displeasure, the motivation to decrease these discrepancies increases.” What’s Next in computing & the role of cloud FPGAs Food, Water Health Breathing Rest Warmth Abraham Harold Maslow was a psychology professor at Alliant International University, Brandeis University, Brooklyn College, New School for Social Research, and Columbia University. Quoted text and image source: Wikipedia Horizontal Needs: Physiological Vertical Needs
  • 10. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 10 Maslow's hierarchy of needs: Safety needs A bit of motivation “Once a person's physiological needs are relatively satisfied, their safety needs to take precedence and dominate behavior. In the absence of physical safety – due to war, natural disaster, family violence, childhood abuse, etc. and/or in the absence of economic safety – (due to an economic crisis and lack of work opportunities) these safety needs manifest themselves in ways such as a preference for job security, grievance procedures for protecting the individual from unilateral authority, savings accounts, insurance policies, disability accommodations, etc. This level is more likely to predominate in children as they generally have a greater need to feel safe. It includes shelter, job security, health, and safe environments. If a person does not feel safe in an environment, they will seek safety before attempting to meet any higher level of survival”. What’s Next in computing & the role of cloud FPGAs Abraham Harold Maslow was a psychology professor at Alliant International University, Brandeis University, Brooklyn College, New School for Social Research, and Columbia University. Quoted text and image source: Wikipedia Vertical Needs Physiological Needs House Security Care Horizontal Needs: Safety Financial
  • 11. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 11 Maslow's hierarchy of needs: Social needs A bit of motivation “After physiological and safety needs are fulfilled, the third level of human needs is interpersonal and involves feelings of belongingness. According to Maslow, humans possess an effective need for a sense of belonging and acceptance among social groups, regardless of whether these groups are large or small. For example, some large social groups may include clubs, co-workers, religious groups, professional organizations, sports teams, gangs, and online communities. Some examples of small social connections include family members, intimate partners, mentors, colleagues, and confidants. Humans need to love and be loved – both sexually and non-sexually – by others. Many people become susceptible to loneliness, social anxiety, and clinical depression in the absence of this love or belonging element. Deficiencies due to hospitalism, neglect, shunning, ostracism, etc. can adversely affect the individual's ability to form and maintain emotionally significant relationships in general.” What’s Next in computing & the role of cloud FPGAs Abraham Harold Maslow was a psychology professor at Alliant International University, Brandeis University, Brooklyn College, New School for Social Research, and Columbia University. Quoted text and image source: Wikipedia Vertical Needs Physiological Needs Safety Needs Friendship Family Horizontal Needs: Social Intimacy
  • 12. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 12 Maslow's hierarchy of needs: Esteem needs A bit of motivation “Esteem needs are ego needs or status needs. People develop a concern with getting recognition, status, importance, and respect from others. Most humans need to feel respected; this includes the need to have self- esteem and self-respect. Esteem presents the typical human desire to be accepted and valued by others. People often engage in a profession or hobby to gain recognition. These activities give the person a sense of contribution or value. Low self-esteem or an inferiority complex may result from imbalances during this level in the hierarchy. Psychological imbalances such as depression can distract the person from obtaining a higher level of self- esteem. Most people have a need for stable self-respect and self-esteem. Maslow noted two versions of esteem needs: a "lower" version and a "higher" version. This means that esteem and the subsequent levels are not strictly separated; instead, the levels are closely related.” What’s Next in computing & the role of cloud FPGAs Abraham Harold Maslow was a psychology professor at Alliant International University, Brandeis University, Brooklyn College, New School for Social Research, and Columbia University. Quoted text and image source: Wikipedia Vertical Needs Physiological Needs Safety Needs Social Needs Recognition Trust Horizontal Needs: Esteem Respect
  • 13. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 13 Maslow's hierarchy of needs: Self-actualization needs A bit of motivation “This level of need refers to the realization of one's full potential. Maslow describes this as the desire to accomplish everything that one can, to become the most that one can be. People may have a strong, particular desire to become an ideal parent, succeed athletically, or create paintings, pictures, or inventions. To understand this level of need, a person must not only succeed in the previous needs but master them. Self-actualization can be described as a value-based system when discussing its role in motivation. Self- actualization is understood as the goal or explicit motive, and the previous stages in Maslow's Hierarchy fall in line to become the step-by- step process by which self-actualization is achievable; an explicit motive is the objective of a reward-based system that is used to intrinsically drive completion of certain values or goals. Individuals who are motivated to pursue this goal seek and understand how their needs, relationships, and sense of self are expressed through their behavior. Self-actualization can include: Partner Acquisition, Parenting, Utilizing & Developing Talents & Abilities, Pursuing goals.” What’s Next in computing & the role of cloud FPGAs Abraham Harold Maslow was a psychology professor at Alliant International University, Brandeis University, Brooklyn College, New School for Social Research, and Columbia University. Quoted text and image source: Wikipedia Vertical Needs Physiological Needs Safety Needs Social Needs Esteem Needs Parenting Goals Horizontal Needs: Self-actualization Talents
  • 14. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 14 Maslow's hierarchy of needs: Transcendence needs A bit of motivation “In his later years, Abraham Maslow explored a further dimension of motivation, while criticizing his original vision of self-actualization. By these later ideas, one finds the fullest realization in giving oneself to something beyond oneself—for example, in altruism or spirituality. He equated this with the desire to reach the infinite. Transcendence refers to the very highest and most inclusive or holistic levels of human consciousness, behaving and relating, as ends rather than means, to oneself, to significant others, to human beings in general, to other species, to nature, and to the cosmos” Maslow 1971, p. 269 What’s Next in computing & the role of cloud FPGAs Abraham Harold Maslow was a psychology professor at Alliant International University, Brandeis University, Brooklyn College, New School for Social Research, and Columbia University. Quoted text and image source: Wikipedia Vertical Needs Physiological Needs Safety Needs Social Needs Esteem Needs Self-actualization Needs Transcendence
  • 15. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 15 Computing hierarchy of needs: Physical needs A bit of motivation What’s Next in computing & the role of cloud FPGAs Information Representation Materials Power Thermal Horizontal Needs: Physiological Vertical Needs . . . Claude Shannon The origins of information theory Image source: Wikipedia He is the founder of digital circuit design theory when, in 1937, he wrote his thesis demonstrating that electrical applications of Boolean algebra could construct any logical numerical relationship. Assumption: separation of information from physics -> that separation is being “challenged” by quantum computing today. 0 1 Prior to Shannon those things had nothing in common. Today we get to see them both as processors or carriers of information. 12-row/80-column IBM punched card from the mid- twentieth century, Image source: Wikipedia A section of DNA. The bases lie vertically between the two spiraling strands, Image source: Wikipedia 0+1
  • 16. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 16 Computing hierarchy of needs: Technological needs A bit of motivation What’s Next in computing & the role of cloud FPGAs Horizontal Needs: Technological Vertical Needs . . . Physical Needs Transistor Variability Yield Aging . . . H. -. P. Wong et al., "A Density Metric for Semiconductor Technology [Point of View]," in Proceedings of the IEEE, vol. 108, no. 4, pp. 478- 482, April 2020, doi: 10.1109/JPROC.2020.2981715.
  • 17. Moore’s Law End ? Really ? — “medium-K, oxide-minimized, semi-strained, anti-dielectric half-pitch.” ? Transistor scaling Intel, IEDM 2019, Germanium-based GAAFET PMOS device layer on top of a more traditional silicon FinFET NMOS System scaling: Beyond the transistor, e.g. Intel’s EMIB (Embedded Multi-die Interconnect Bridge) and Foveros to connect chiplets in both 2 and 3 dimensions (HBM in CPU-GPU) • 5 chipmakers/foundries in the 16nm/14nm market—GlobalFoundries, Intel, Samsung, TSMC UMC, SMIC (14nm finFETs). • GlobalFoundries and UMC last year halted their respective 7nm process efforts. • Currently, TSMC's 7nm process is in its peak (orders from AMD for its Ryzen 3000-series CPUs and Navi graphics cards). Huge invest in 5nm. • Compared to 7nm, Samsung’s 5nm finFET technology provides up to a 25% increase in logic area with 20% lower power or 10% higher performance. • TSMC expects mass 3nm production in 2022. • A nanosheet FET is a type of gate-all-around (GAA) architecture. That’s not the only possible scenario. “The industry is very conservative. They will try to extend the finFET as much as possible,” IMEC’s Naoto Horiguchi said. “At 3nm, we have a window to use a finFET. But we need several process innovations for finFET in terms of overall improvement. • TSMC announced starting 2nm development (Apr. 2020) https://semiengineering.com/5nm-vs-3nm/ The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary https://semiwiki.com/eda/synopsys/294205-what-might-the-1nm- node-look-like/ Nadine Collaert, "1.3 Future Scaling: Where Systems and Technology Meet," 2020 IEEEInternational Solid- State Circuits Conference - (ISSCC), San Francisco, CA, USA, 2020, pp. 25-29, doi: 10.1109/ISSCC19947.2020.9063033. o “10 micron” in 1972 through “0.35 micron” in 1995, an impressive 23-year run where the node name matched gate length. o Then, in 1997 with the “0.25 micron/250 nm” node they started over-achieving with an actual Lg of 200 nm – 20% better than the name would imply. o This “sandbagging” continued through the next 12 years, with one node (130nm) having Lg of only 70nm – almost a 2x buffer. Then, in 2011, Intel jumped over to the other side of the ledger, ushering in what we might call the “overstating decade” with the “22nm” node sporting an Lg of 26 nm. Since then, things have continued to slide further in that direction, with the current “10nm” node measuring in with an Lg of 18 nm – almost 2x on the other side of the “named” dimension. o Most industry folks understand that Intel’s “10nm” process is roughly equivalent to TSMC and Samsung’s “7nm” processes. https://www.eejournal.com/article/no-more-nanometers/ July 23, 2020 What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
  • 18. Moore’s Law End ? Really ? — I prefer to respect that it is aging fairly gracefully! Transistor scaling Cost scaling What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research • The cost to design a 28nm planar device ranges from $10 million to $35 million (Gartner). • The cost to design a 7nm system-on-a-chip (SoC) ranges from $120 million to $420 million (Gartner). • 5nm is a completely new process with updated EDA tools and IP. The cost to design a 5nm device ranges from $210 million to $680 million (Gartner). $20Bper fab run at 3nm (IBS) • The NRE costs of RnD optical proximity correction, multi-patterning, and extreme ultraviolet (EUV). https://www.eejournal.com/article/no-more-nanometers/ • After 5nm, the next full node is 3nm. But 3nm is not for the faint of heart. • The cost to design a 3nm device ranges from $500 million to $1.5 billion, according to IBS. • Process development costs ranges from $4 billion to $5 billion, while a fab runs $15 billion to $20 billion, according to IBS. • “Transistor costs at 3nm are expected to be 20% to 25% higher than at 5nm based on same level of maturity,” IBS’ Jones said. “Expect 15% more performance and with 25% less power consumption compared to 5nm finFETs.” https://semiengineering.com/5nm-vs-3nm/ 18
  • 19. 1. Computer performance was driven by clock speeds. 2. Facing several physical walls, clock speed parallelism. (Amdahl’s law - parallelism will soon be limited for nonscientific computations). 3. An unavoidable path towards specialization devices (such as ASICs, DPUs, TPUs, IPUs, ...), Thus, computer performance will probably need to seek another driving factor. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 19 Computing hierarchy of needs: Architectural needs A bit of motivation What’s Next in computing & the role of cloud FPGAs Horizontal Needs: Architectural Vertical Needs . . . Physical Needs . . . Technological Needs (μ)-architecture Edge-to-cloud, HPC Storage Gordon Moore’s law and its derivatives; T: Transistor total, Klauer, Bernd. “The Convey Hybrid-Core Architecture.” (2013). 1 2 3 Latency, Cost, …, Energy efficiency . . .
  • 20. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 20 Computing hierarchy of needs: Software needs A bit of motivation What’s Next in computing & the role of cloud FPGAs Horizontal Needs: Software Vertical Needs . . . Physical Needs . . . Technological Needs . . . Languages Libraries Virtualization Architectural Needs . . .
  • 21. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 21 Computing hierarchy of needs: Cloud needs A bit of motivation What’s Next in computing & the role of cloud FPGAs Horizontal Needs: Cloud Vertical Needs . . . Physical Needs . . . Technological Needs Architectural Needs Software Needs PaaS, IaaS, …, FaaS 5G/6G Edge, IoT, V2X . . . https://amulya-bhatia.medium.com/iaas-vs-caas-vs-paas-vs-faas-vs-saas-whats-the-difference-ee84ecc2d519 . . . Comparison of the 4G (IMT-Advanced) and 5G (IMT-2020) specifications. Source: ETSI
  • 22. General AI Revolutionary True neuro-AI Cross-domain learning and reasoning Broad autonomy with moral reasoning Wetware? Transcendence ? (personal view) Broad AI Disruptive and Pervasive Neuro-symbolic AI Multi-task, multi-domain, multi- modal Trusted AI capable of learning with much less data Reduced Precision and Analog HW What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 22 Computing hierarchy of needs: AI needs A bit of motivation What’s Next in computing & the role of cloud FPGAs . . . Physical Needs . . . Technological Needs Architectural Needs Software Needs “Transcendence refers to the very highest and most inclusive or holistic levels of human consciousness, behaving and relating, as ends rather than means, to oneself, to significant others, to human beings in general, to other species, to nature, and to the cosmos” Maslow 1971, p. 269 AI Narrow AI Emerging Deep Learning Single-task, single-domain, with superhuman accuracy Requires large-amounts of labeled data CPU & GPU We are here now Cloud Needs The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary . . . . . .
  • 23. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 23 My view on motivation for human-centric trusted AI A bit of motivation What’s Next in computing & the role of cloud FPGAs . . . Physical Needs . . . Technological Needs Architectural Needs Software Needs AI Cloud Needs . . . . . . Physiological Needs Safety Needs Social Needs Esteem Needs Self-actualization Needs Transcendence Augment human2human & human2cosmos consciousness Computing hierarchy of needs Maslow's hierarchy of needs
  • 24. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 24 https://aif360.mybluemix.net/ https://www.ibm.com/blogs/research/2018/04/ai-adversarial-robustness-toolbox/ http://aix360-dev.mybluemix.net/?_ga=2.230889183.1995265854.1610364654-99329142.1609856291 https://www.research.ibm.com/artificial-intelligence/trusted-ai/
  • 25. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 25 Bits Mathematics + Information Today’s Computers and Supercomputers Neurons Biology + Information Today’s AI Systems Qubits Physics + Information Today’s Quantum Systems The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary How we get there What’s Next in computing & the role of cloud FPGAs
  • 26. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 26 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory “In 1945, while consulting for the Moore School of Electrical Engineering on the EDVAC project, von Neumann wrote an incomplete set of notes, titled the First Draft of a Report on the EDVAC. This widely distributed paper laid foundations of a computer architecture in which the data and the program are both stored in the computer’s memory in the same address space, which will be described later as von Neumann Architecture (drawing at right). This architecture became the de facto standard for a long time and is still used today (until technology enabled more advanced architectures).” https://history-computer.com/john-von-neumann-biography-history-and-inventions/
  • 27. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 27 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Compute-bound Memory-bound Optimal operation point (Bandwidth and CPU are not under-utilized)
  • 28. 28 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Compute-bound Memory-bound Future CPU Computation Roof Amdahl’s Law & Dark Silicon: The future is not 1000s of conventional cores “Amdahl’s Law of specialization” is it better to speedup 1% of apps by 100× or allapps by 1% ? What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
  • 29. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 29 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Compute-bound Memory-bound System specialization using accelerators: Architectures designed with a specific class of computations in mind Accelerator Memory
  • 30. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 30 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Arithmetic intensity of A (depends only on application’s characteristics) Application A (specifications)
  • 31. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 31 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Arithmetic intensity of A Computing performance “needs” for Application A Application A needs (perf. Pow)
  • 32. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 32 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Baseline performance for A Arithmetic intensity of A Coding App. A (C,C++,Java,Python) Computing performance “needs” for Application A
  • 33. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 33 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Compiler Optimizations (gcc -03 …) A w/ comp. opt.
  • 34. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 34 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Multi-core (pThreads, openMP…) A w/ comp. opt. A w/ multi-core
  • 35. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 35 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A SIMD (SSE, AVX, …) A w/ comp. opt. A w/ multi-core A w/ SIMD
  • 36. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 36 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A DVFS (freq. boost, …) A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS
  • 37. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 37 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Application A Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Manual Code optimization (profiling and fun…) A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS A w/ manual code opt.
  • 38. Accelerator I What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 38 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Buy accelerator I (GPU, TPU, ASIC…) A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS A w/ manual code opt. Memory Application A A w/ accelerator I Assuming CPU-Acc. BW is sufficient !!!
  • 39. Accelerator I What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 39 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Use vendor libs of accelerator I (cuBLAS, cuDNN, …) A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS A w/ manual code opt. Memory Application A A w/ accelerator I (HW) A w/ accelerator I (HW+SW) Assuming CPU-Acc. BW is sufficient !!!
  • 40. Accelerator I What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 40 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A ? A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS A w/ manual code opt. Memory Application A A w/ accelerator I (HW) A w/ accelerator I (HW+SW) Arithmetic intensity of B Application B Computing performance “needs” for Application B
  • 41. Accelerator II Accelerator II What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 41 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Buy Acc. II with better memory (HBM2, …) A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS A w/ manual code opt. Memory Application A A w/ accelerator I (HW) A w/ accelerator I (HW+SW) Arithmetic intensity of B Application B Computing performance “needs” for Application B
  • 42. Accelerator II Accelerator II Accelerator II Accelerator II Accelerator II What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 42 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Buy Accelerators III, IV, V … A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS A w/ manual code opt. Memory Application A A w/ accelerator I (HW) A w/ accelerator I (HW+SW) Arithmetic intensity of B Application B Computing performance “needs” for Application B Application C Application D Application E
  • 43. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 43 A bit of motivation Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Baseline performance for A Arithmetic intensity of A Computing performance “needs” for Application A Buy FPGA o custom logic, o custom memory, o custom interconnects A w/ comp. opt. A w/ multi-core A w/ SIMD A w/ DVFS A w/ manual code opt. A w/ accelerator I (HW) A w/ accelerator I (HW+SW) Arithmetic intensity of B Computing performance “needs” for Application B FPGA Application A Application B Application C Application D Application E Not only custom, but also reconfigurable at seconds’ speed !
  • 44. Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 44 ✓ Reconfigurable logic ✓ Reconfigurable memory ✓ Reconfigurable interconnects ASICs A GPU is effective at processing the same set of operations in parallel – single instruction, multiple data (SIMD). A GPU has a well-defined instruction-set, and fixed word sizes – for example single, double, or half-precision integer and floating point values. ▪ An FPGA is effective at processing the same or different operations in parallel – multiple instructions, multiple data (MIMD). An FPGA does not have a predefined instruction-set, or a fixed data width. Figures source: AWS - Announcing Amazon EC2 F1 Instances with Custom FPGAs, Bringing Hardware Acceleration closer to the programmer, Ecoscale-ExaNest workshop, 2017 What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research Silicon alternatives for rapid enterprise-ready specialization 44
  • 45. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research ASICs vs GPUs vs TPUs vs DPUs vs FPGAs vs Apples vs … 45 WP492 (v1.0.1) June 13, 2017, Xilinx All Programmable Devices: A Superior Platform for Compute-Intensive Systems Why FPGAs ? GPUs FPGAs How the comparable raw performance between FPGAs and GPUs bring growth ?
  • 46. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 46 TPUs vs GPUs vs FPS/TOPS FPGAs vs GPUs vs ASICs vs After Google announced the scaling capabilities of TPUv4 [1], Nvidia adopted the "per-chip" metric for A100 [2]. So what is the “right” granularity? Who defines that ? (No-one!). MLperf has a diverse set of benchmarks which unveil various system bottlenecks [3], but favors FLOPS, i.e. a game where FPGAs have not the strongest point.There are a zillion companies out there doing inference [4], and they make a lot of claims, but who is going to have the biggest ROI for improving results ? [1] https://cloud.google.com/blog/products/ai-machine-learning/google-breaks-ai-performance-records-in-mlperf-with-worlds- fastest-training-supercomputer [2] https://www.eetimes.com/nvidia-google-both-claim-mlperf-training-crown/ [3] https://ieeexplore.ieee.org/document/9238612 [4] https://basicmi.github.io/AI-Chip/
  • 47. 47 “If you’re not sure of the optimal algorithms for say compression or encryption for the data you’re processing, or the data shape is going to be changing over time so you don’t want to take the risk of burning it to the silicon, you can experiment and be agile on FPGAs” Azure Chief Technology Officer, Mark Russinovic Microsoft’s cloud strategy favors FPGAs The impact of FPGAs on query latency for Bing; even at double the query load FPGA-accelerated ranking has lower latency than software-powered ranking at any load. A. M. Caulfield et al., "A cloud-scaleacceleration architecture," 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, 2016, pp. 1-13, doi: 10.1109/MICRO.2016.7783710. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research … from research wise vision in 2015 …
  • 48. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 48 (Microsoft’s) Doug Burger's talk at FPL, Sep. 1st 2020 … to industry adoption in 2020!
  • 49. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 49 Live Video Transcoding Launch, Aaron Behman, Director of Video Product Marketing, Data Center Group https://www.xilinx.com/publications/presentations/video-transcoding-media-deck.pdf Definition: Capital expenditures (CapEx) are funds used by a company to acquire, upgrade, and maintain physical assets such as property, plants, buildings, technology, or equipment. CapEx is often used to undertake new projects or investments by a company. Capital Expenditure (CapEx) Definition - Investopedia www.investopedia.com
  • 50. Hype source: Gartner,2020, https://www.gartner.com/en/documents/3988006/hype-cycle-for-artificial-intelligence-2020 What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research What’s Next in FPGAs for cloud is agility How we envision to increase FPGAs agility? Many enterprises experience a steady decline in their ability to coordinate & operationalize FPGA projects — particularly at scale. 50
  • 51. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 51 1.Infra 2.Software 3.Automation 4.Composability 4. Composability 3. Automation 2. Software 1. Infra Our visionary journey of FPGAs for cloud 51 What’s Next in FPGAs for cloud is agility
  • 52. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 52 1.Infra 2.Software 3.Automation 4.Composability 1. Infra Our visionary journey of FPGAs for cloud 52 What’s Next in FPGAs for cloud is agility
  • 53. Accelerator I What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 53 What’s Next in FPGAs for cloud is agility Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Arithmetic intensity of A Computing performance “needs” for Application A Use FPGAs with high CPU-FPGA interconnect BW Memory Application A 1 A w/ accelerator I (HW+SW) Assuming CPU-Acc. BW is sufficient !!! 2 Performance drops due to inefficient CPU-Accelerator BW
  • 54. 54 cloudFPGA and OpenCAPI-FPGAs in a nutshell What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research FPGA as a Co-Processor POWER9 AC922 + V100 + 9V3/9H7 Up to 8 OpenCAPI FPGAs per 2U chassis.
  • 55. Weather modeling What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 55 Research in OpenCAPI-attached FPGAs What’s Next in FPGAs for cloud is agility Kaan Kara, Christoph Hagleitner, Dionysios Diamantopoulos, Dimitris Syrivelis, Gustavo Alonso: High Bandwidth Memory on FPGAs: A Data Analytics Perspective. FPL 2020 In-Memory Data Analytics http://www.cosmo-model.org Image source: payodsoft.com Genomics o Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Sander Stuijk, Henk Corporaal: NARMADA: Near-Memory Horizontal Diffusion Accelerator for Scalable Stencil Computations. FPL 2019 o Gagandeep Singh, Dionysios Diamantopoulos, Christoph Hagleitner, Juan Gómez-Luna, Sander Stuijk, Onur Mutlu, Henk Corporaal: NERO: A Near High-Bandwidth Memory Stencil Accelerator for Weather Prediction Modeling. FPL 2020 Abbas Haghi, Lluc Alvarez, Jordà Polo, Dionysios Diamantopoulos, Christoph Hagleitner, Miquel Moretó: A Hardware/Software Co-Design of K-mer Counting Using a CAPI-Enabled FPGA. FPL 2020:
  • 56. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 56 What’s Next in FPGAs for cloud is agility Attainable performance: G(FL)OPS Computation to communication ratio: (FL)OP/Byte CPU Computation Roof CPU Memory Buy FPGA o custom logic, o custom memory, o custom interconnects FPGA Application A Application B Application C Application D Application L I have 12 applications, but in one 2u-node I can only attach up to 8x FPGAs Well, multi-tenancy is also available in FPGAs, but the scaling problem remains in the long-term… Not only custom, but also reconfigurable at seconds’ speed !
  • 57. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 57 What’s Next in FPGAs for cloud is agility Computation to communication ratio: (FL)OP/Byte Use cloudFPGA o End CPU slavery! o Deploy FPGAs at large scale in hyperscale DCs FPGA Application A Application B Application C Application D Application E ~1000 FPGAs / rack FPGA Excuse my artistic and simplifying vision that ignores : o the bindings of an application to SW libraries, o the runtime options, o the scheduler policies, o the resource-manager policies, o the control-plane and data-plane management, o the security o and many-many more … THINK BIG! Attainable performance: G(FL)OPS
  • 58. 58 cloudFPGA and OpenCAPI-FPGAs in a nutshell What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research FPGA as a Co-Processor FPGA as a Peer-Processor POWER9 AC922 + V100 + 9V3/9H7 64 FPGAs into one 19"×2U chassis (64-port 10GbE =640Gb/s BW). In all, 16 such chassis fit into a 42U rack for a total of 1024 FPGAs and 16 TB of DRAM. Up to 8 OpenCAPI FPGAs per 2U chassis.
  • 59. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 59 cloudFPGA and OpenCAPI-FPGAs in a nutshell FPGA as a Co-Processor FPGA as a Peer-Processor https://github.com/...<STAY_TUNED> https://github.com/OpenCAPI/oc-accel
  • 60. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research cloudFPGA concept 60 Highlights • dense → chassis w/ 64 compute units → ~1000 FPGAs / rack • integration of 1st level switch → full cross-sectional BW → low cost (cables / rack space) • energy efficient → no SW/FW overhead → no CPU overhead → (hot) water cooling • self-hosted / network-attached → bare-metal support → scalabl IP Address: 10.10.1.9 DRAM: 8GB, BRAM: 38MB CLBs:660.000, DSPs: 2760 IP Address: 10.10.1.50 DRAM: 32GB, Cores: 4 The FPGA becomes the node ! Goal → Deploy FPGAs at large scale in hyperscale DCs 1-10s of thousands per DC What’s Next in FPGAs for cloud is agility
  • 61. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 61 Standalone network-attached FPGA 1.Replace PCIe I/F with integrated NIC (iNIC). 2.Turn FPGA card into a self- contained appliance. 3.Replace transceivers w/ backplane connectivity.
  • 62. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 62 One carrier sled = 32 FPGA modules 1. Our first FPGA module uses a Xilinx Kintex Ultrascale KU060 o A mid-range FPGA with high performance/price and low wattage
  • 63. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 63 One carrier sled = 32 FPGA modules 1. Our first FPGA module uses a Xilinx Kintex Ultrascale KU060 o A mid-range FPGA with high performance/price and low wattage
  • 64. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 64 Two carrier sleds per chassis = 64 FPGAs
  • 65. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 65 Sixteen chassis per rack = 1024 FPGAs
  • 66. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 66 cloudFPGA
  • 67. 67 Compute density - S822LC (aka Minsky) vs FPGA chassis ~x2 INT8 TOPS ~x4 INT4 TOPS ~x8 INT2 TOPS ~x16 Bin TOPS What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
  • 68. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 68 1.Infra 2.Software 3.Automation 4.Composability Our visionary journey of FPGAs for cloud 68 Core themes What’s Next in FPGAs for cloud is agility 2. Software 1. Infra
  • 69. Who we built this for 69 What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research cF developer #1 cF developer #2 cF developer #3 cF (Nirvana) developer #4 Tools ❑ cloudFPGA Studio ✓ Host: cFPy, Jupyter Lab ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL ❑ cFDK ✓ Host: ZRLMPI, cFPy, OpenROLE (SW) ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL with OpenROLE (HW) ❑ cFDK ✓ Host: Custom API with TCP/UDP ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL with AXI I/F ❑ User’s front-end application integrates with cloud-native cF software that leverages cF nodes transparently. ✓ Host: gRPC, RESTapi, ... ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL “I wasn’t aware the service I am using involved cloudFPGA.” “I need to accelerate an application. I don’t know RTL/HLS and hardware design.” “I want to create or reuse my RTL/HLS designs while designing HW and SW middleware.” “I want to create or reuse RTL/HLS kernels while using standard APIs whenever possible.”
  • 70. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research Software action #1 cFDK REST API 70 cF developer #1 “I want to create or reuse my RTL/HLS designs while designing HW and SW middleware.” Disclaimer: Hardware in FPGA world can be software too! ☺ e.g. “FPGAs for Software Programmers”, Dirk Koch, Frank Hannig, and Daniel Ziener. 2016. Springer Publishing Company, Incorporated. DONE What’s Next in FPGAs for cloud is agility
  • 71. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research Software action #2 cFDK REST API 71 cF developer #1 cF developer #2 “I want to create or reuse my RTL/HLS designs while designing HW and SW middleware.” “I want to create or reuse RTL/HLS kernels while using standard APIs whenever possible.” On track FCCM2020 Workshop: THE FUTURE OF FPGA-ACCELERATION IN CLOUD AND DATA CENTERS openRole: Do we need a POSIX for FPGAs? Burkhard Ringlein, IBM Research Europe http://www.fccm.org/proceedings/2020/Workshops/Future_of_FPGA_Workshop/2020-04-29_openRole_workshop_public-Burkhard%20Ringlein.pdf FPL2020 Workshop: DevOps support for Cloud FPGA platforms openRole: Can we bring ‘Design once, run everywhere’ to FPGAs? Burkhard Ringlein, IBM Research Europe What’s Next in FPGAs for cloud is agility
  • 72. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research Software action #3 Quantitative Finance Kernel Weather Modeling Kernel Computer Vision Kernel Database Acceleration Kernel DSP Kernel Data Security Kernel Linear Algebra Kernel AI Inference Kernel Domain-specific Languages Accelerated Libraries Custom Accelerators Data Analytics Kernel cFDK REST API Abstraction levels 72 cF developer #1 cF developer #2 “I want to create or reuse my RTL/HLS designs while designing HW and SW middleware.” “I want to create or reuse RTL/HLS kernels while using standard APIs whenever possible.” cF developer #3 “I need to accelerate an application. I don’t know RTL/HLS and hardware design.” on track What’s Next in FPGAs for cloud is agility
  • 73. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 73 Software 1.0 Software 2.0 ML The road to Software 2.0, M. Loukides and B. Lorica, O’Reilly, December 10, 2019 https://www.oreilly.com/radar/the-road-to-software-2-0/ What’s Next in FPGAs for cloud is agility
  • 74. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 74 Deploy Deep Learning Everywhere: Limitations New operator introduced by operator fusion optimization potential benefit TVM For Fun and Profit Tutorial, at FCRC 2019
  • 75. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 75 Build intelligent systems with learning (offline and online) TVM For Fun and Profit Tutorial, at FCRC 2019
  • 76. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 76 End-to-end compilation flow for transprecision FPGAs Agile Autotuning of a Transprecision Tensor Accelerator Overlay D.Diamantopoulos et al., FPL2020
  • 77. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 77 1.Infra 2.Software 3.Automation 4.Composability 3. Automation 2. Software 1. Infra Our visionary journey of FPGAs for cloud 77 Core themes What’s Next in FPGAs for cloud is agility
  • 78. Who we built this for 78 What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research I can run my containerized application without having to worry about sizing, creating or managing a cluster. “Run my container” vs. “Give me a cluster, that I can then run my container on”. Container-Savvy Developer Functions Developer I love Functions-as-a-Service and can now run them with almost no limits. I now have a single platform to securely combine Functions with Apps and other containerized workloads. cF developer #1 cF developer #2 cF developer #3 cF (Nirvana) developer #4 “I wasn’t aware the service I am using involved cloudFPGA.” “I need to accelerate an application. I don’t know RTL/HLS and hardware design.” “I want to create or reuse my RTL/HLS designs while designing HW and SW middleware.” “I want to create or reuse RTL/HLS kernels while using standard APIs whenever possible.” PaaS/IaaS Developer I can start utilizing a new powerful platform/infra and: - keep using a “push source code” experience - do not have to worry about containers - can easily connect my code to backing services Tools ❑ cloudFPGA Studio ✓ Host: cFPy, Jupyter Lab ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL ❑ cFDK ✓ Host: ZRLMPI, cFPy, OpenROLE (SW) ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL with OpenROLE (HW) ❑ cFDK ✓ Host: Custom API with TCP/UDP ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL with AXI I/F ❑ User’s front-end application integrates with cloud-native cF software that leverages cF nodes transparently. ✓ Host: gRPC, RESTapi, ... ✓ Kernel: VHDL, Verilog, C, C++, SystemC, OpenCL
  • 79. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 79 Software-defined multi-FPGA fabric
  • 80. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 80 1.Infra 2.Software 3.Automation 4.Composability 4. Composability 3. Automation 2. Software 1. Infra Our visionary journey of FPGAs for cloud 80 Core themes What’s Next in FPGAs for cloud is agility
  • 81. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 81 ▪ Thousands of tiny CPUs using high parallelization ▪ compute intensive application ▪ SIMD-oriented workloads ▪ Logic + IOs are customized ▪ Very low and predictable latency ▪ MIMD-oriented workloads New AI HW ▪ 64 FPGAs into one 19"×2U chassis (64-port 10GbE =640Gb/s BW). ▪ In all, 16 such chassis fit into a 42U rack for a total of 1024 FPGAs and 16 TB of DRAM. cloudFPGA https://www.zurich.ibm.com/cci/cloudFPGA Hybrid cloud GPU FPGA Composable systems with FPGAs
  • 82. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 82 HelmGemm: AI HW fractionalization ➢ Docker container service: multi-tenant environment with a high-level API to provide lightweight containers that run processes in isolation ➢ Kubernetes management: deploy, maintain, and scale applications ➢ HelmGemm extension: hardware, middleware and software ➢ Hardware support : 4xGPUs, 2xFPGAs HelmGemm: Managing GPUs and FPGAs for Transprecision GEMM Workloads in Containerized Environments
  • 83. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 83 HelmGEMM overview System Memory GPU Memory FPGA Memory Accelerators’ view of memory CPU P9 GPU V100 FPGA 9V3 NVLink 3bricks x 50GBps CAPI2 (32GBps) OpenCAPI (<50GBps) 120GBps/ socket (Open)CAPI ➢ (Open)CAPI technology enables an HPC node with unified memory for accelerators CPU P9 GPU V100 FPGA 9V3
  • 84. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 84 HelmGEMM case study of Yolov3 Mapped to GPU V100 half precision: 140W Mapped to FPGA 9V3 4-13bits: 32W On CPU single precision Mapped to GPU V100 half precision: 140W On CPU single precision ➢ Heterogenous execution of Yolov3 CNN on P9+V100+AD9V3 for energy efficiency
  • 85. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 85 HelmGEMM evaluation 28.7x more energy efficiency 59.3x more performance $21,878 $30,587 83 G(fl)Ops/sec 4.94 T(fl)Ops/sec POWER9 AC922 POWER9 AC922 + V100 + 9V3 0.16 G(fl)Ops/sec/Watt 4.78 G(fl)Ops/sec/Watt
  • 86. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 86 Composable systems with FPGAs ▪ Thousands of tiny CPUs using high parallelization ▪ compute intensive application ▪ SIMD-oriented workloads GPU FPGA ▪ Logic + IOs are customized exactly for the application's needs. ▪ Very low and predictable latency applications ▪ MIMD-oriented workloads New AI HW Byte-addressable Byte-addressable External: Byte-addressable Internal : >Bit-addressable fp16, fp8, int4 int2
  • 87. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 87 Communication for low-precision AI HW PHRYCTORIA motivation: Traditional communication mechanisms for modern low-precision data-types (e.g. brain-float16, int5) cannot exploit the bandwidth of emerging communication links for FPGA accelerators (e.g. OpenCAPI, PCIe4, etc). Heterogeneous System: IBM1 IC922, 2POWER91 CPUs, AlphaData ADM-9H7 (Xilinx VU37P FPGA), OpenCAPI 3.0 25Gbps8. -2x BW utilization for brain-float16 -6x BW utilization for int5 Name inspired after the ancient Greek communication system “ΦΡΥΚΤΩΡΙΑ”, 1900 B.C.
  • 88. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 88 Bits Mathematics + Information Today’s Computers and Supercomputers Neurons Biology + Information Today’s AI Systems Qubits Physics + Information Today’s Quantum Systems The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary How we get there What’s Next in computing & the role of cloud FPGAs
  • 89. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 89 https://analog-ai-demo.mybluemix.net/ o The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary o E. Eleftheriou et al., "Deep learning acceleration based on in-memory computing," in IBM Journal of Research and Development, vol. 63, no. 6, pp. 7:1-7:16, 1 Nov.-Dec. 2019 o https://www.research.ibm.com/artificial-intelligence/ai-hardware-center/ https://www.ibm.com/blogs/research/2019/02/ai-hardware-center/ , Feb. 2019 o Extending performance by 2.5X / year through 2025 o Approximate computing principles applied to Digital AI Cores with reduced precision and Analog AI cores (Non von Neumann HW) o PCM devices have the ability to store synaptic weights in their analog conductance state. When PCM devices are arranged in a crossbar configuration, it allows to perform an analog matrix-vector multiplication in a single time step, exploiting the advantages of multi-level storage capability and Kirchhoff’s circuits laws. https://www.technologyreview.com/2020/12/11/1014102/ai-trains-on-4-bit-computers/ What’s Next in computing & the role of cloud FPGAs
  • 90. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 90 Bits Mathematics + Information Today’s Computers and Supercomputers Neurons Biology + Information Today’s AI Systems Qubits Physics + Information Today’s Quantum Systems The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary How we get there What’s Next in computing & the role of cloud FPGAs
  • 91. https://qiskit.org/ https://www.ibm.com/quantum-computing/experience/ At CES 2020, IBM research director Dario Gil gave the audience a primer on quantum computing and predicted that the industry will achieve quantum advantage this decade. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
  • 92. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 92 Bits Qubits Neurons Bits Qubits Neurons Bits Qubits Neurons Bits Qubits Neurons Accelerated discovery Bits + Neurons + Qubits Deep Search Intelligent Simulation Generative Models Autonomous Labs Bits + Neurons Bits + Qubits Bits + Neurons Bits + Neurons The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary
  • 93. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 93 How we get there What’s Next in computing & the role of cloud FPGAs The role of cloud FPGAs 1. FPGAs are eligible to become 1st class citizens ➢ Standalone approach sets the FPGA free from the CPU o Large scale deployment of FPGAs independent of #servers o Significantly lowers the entry barrier ➢ Promotes the use of medium and low-cost FPGAs 2. The network-attachment model ➢ Makes FPGAs IP-addressable and scalable in DCs o Users can rent and link them in any type of topology ➢ Opens the path for use of FPGAs in large scale applications o Serverless computing, HPC, DNN inference, Signal Processing, … 3. The hyperscale infrastructure ➢ Integrates FPGAs at the chassis (aka drawer) level ➢ Combines passive and active water cooling ➢ Key enabler for FPGAs to become plentiful in DCs The Future of Computing: Bits + Neurons + Qubits, Dario Gil and William M. J. Green, ISSCC2020: Plenary FCCM2020 Workshop: THE FUTURE OF FPGA-ACCELERATION IN CLOUD AND DATA CENTERS cloudFPGA: Promote FPGAs to 1st Citizen in the Cloud, Francois Abel, IBM Research Europe http://www.fccm.org/proceedings/2020/Workshops/Future_of_FPGA_Workshop/cloudFPGA.pdf
  • 94. Research Ecosystem Team Collaboration Burkhard Ringlein, Francois Abel, Beat Weiss, Mitra Purandare, Florian Auernhammer (OCAPI), Raphael Polig (ZYC2), Christoph Hagleitner, Mark Lantz ZRL Collaboration Florian Scheidegger, Cristiano Malossi (H2020 OPRECOMP) Eindhoven University of Technology Gagandeep Singh, Sander Stuijk, Henk Corporaal Former NeMeCo ZRL colleagues: Jan van Lunteren, Ronald Luijten • COOLCHIPS2020, WiP: Automated precision-tuning methods for deep learning models on FPGAs and IoT devices. • ISCAS2018, COOLCHIPS2018, FPT2018, ASAP2019, SAMOS2019, DATE2019, RAW-IPDPS2020, FPL2019, FPL2020, FCCM2020, H2RC2020 • FPL2019, SAMOS2019, DATE2019, FPL2020. NeMeCo H2020 project, near-memory accelerators for weather modeling ETH Stefan Mach, Fabian Schuiki, Germain Haugou, Michael Schaffner, Frank K. Gurkaynak, and Luca Benini • COOLCHIPS2020, Transprecision PULP on IBM-ZRL Cloud ETH Kaan Kara (now Oracle), Dimitris Syrivelis (former IBM- Ireland colleague, now Nvidia) Gustavo Alonso • FPL2020, In-memory database acceleration (FPGA + OpenCAPI + HBM) Barcelona Supercomputing Center Abbas Haghi, Lluc Alvarez, Santiago Marco, Miquel Moreto • FPL2020, Genomics acceleration with CAPI2-FPGA IBM France – IBM China Bruno Mesnet, Alexandre Castellane, Yong Lu (now cyansemi) • SNAP(CAPI1/2) and OC-Accel(OpenCAPI), OpenPOWER Summit 2018, 2019 There is no one-man-show ETH Gagandeep Singh, Juan Gómez-Luna, Onur Mutlu • FPGA2021, Optimizing Near-memory accelerators with ML 94 What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research
  • 95. What’s Next in FPGAs for cloud is agility Infrastructure | Software | Automation | Composability 95
  • 96. ΣΚΕΨΟΥ ! THINK BIG ! IBM Research / Inventing What's Next / © 2020 IBM Corporation 96
  • 97. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 97
  • 98. Transprecision Computing DL: Constrained model synthesis for IoT applications What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 98 IoT Budget & Requirements Inference time Memory size Device type Dataset Images, Sensors, Audio Visual inspection OUTPUT Anomaly detection USER INPUT AUTOMATED ML MODEL SYNTHESIS FOR GIVEN EDGE DEVICE CASE #1 CASE #2 CASE #3 … FPGA ? Sood, A. et al. “NeuNetS: An Automated Synthesis Engine for Neural Network Design.” ArXiv abs/1901.06261 (2019) F. Scheidegger et al. “Constrained deep neural network architecture search for IoT devices accountingfor hardware calibration”, NeurIPS2019 Call for EU-funded PhD: Deep Learning Algorithms for Budget Constrained Applications in the IoT Domain https://tuni.rekrytointi.com/paikat/?o=A_RJ&jgid=3&jid=794
  • 99. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 99 PHRYCTORIA overview Protocol Buffer Serialization/Deserialization Byte-addressable enterprise system Low-precision FPGA accelerator Synthetic FloatX dataset NLP dataset PHRYCTORIA: A Messaging System for Transprecision OpenCAPI-attached FPGA Accelerators, D.Diamantopoulos et al., RAW-IPDPS2020 6.3x-7.4x goodput BW -4.8x MB 6.9x goodput BW Compatible with any gRPC- supported device/service
  • 100. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 100 Survey and Benchmarking of Machine Learning Accelerators , 2019, MIT Lincoln Laboratory Supercomputing Centre https://arxiv.org/abs/1908.11348 “Best” choice depends on requirements for o Throughput (fps), o Latency (ms), o Energy efficiency (fps/watt), o Cost efficiency (fps/$), o Accuracy What’s Next in FPGAs for cloud is agility, operationalized
  • 101. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 101 Autotuning of a Transprecision Tensor Accelerator Agile Autotuning of a Transprecision Tensor Accelerator Overlay D.Diamantopoulos et al., FPL2020 Instead of eliminating the hardware design space with pruning, we propose a technique that builds a prediction model which quantifies the impact of a hardware design choice towards an optimization goal. By using the most important features in order to generate an overlay we manage to perform auto- tuning that succeeds in higher performance by up to 2.5x and faster convergence by up to 8.1x.
  • 102. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 102 -53x GPU memory sharing in containerized systems can lead to GPU performance inefficiencies that fall within the performance envelope of FPGAs, which operate on a power budget one order of magnitude lower. The case of AI HW with GPUs & FPGAs Is your Neural Network Memory-bound or Compute bound for your NEW AI HW? How does it matter for the cloud ? What “AI HW diverseness” means for an enterprise system?
  • 103. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 103 Selected ML workloads
  • 104. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 104 Aggressive bit-width optimization for every AI HW device Simulations to establish which parts of an application can be mapped to lower precision such that their accuracy is not degraded • DeepSpeech and Language Modeling (Euclidean distance compared to fp32 for 100% accuracy) Distribution of the workloads to the lower-precision counterparts as a code-coverage percentage. Yolov3 : aggressive bit-width optimization so that classification accuracy on ImageNet is not less than 72.9% and 91.2% for top-1% and top-5%.
  • 105. What’s Next in computing & the role of cloud FPGAs / © D. Diamantopoulos 2021 IBM Research 105 HelmGEMM measurements