1. 20-25% CAGR in market volumes
Competitive advantage hinges on speed,
transparency, and proximity to data
sources. The application must be in the
data path – seamlessly
Quest to balance risk/compliance with
performance HPC on Wall Street - 2012
2. 10GbE Switches for the
Virtualized Data Center,
but a software company
at the core
>1300 Customers
>325 Employees
Profitable, self-funded,
pre-IPO network
infrastructure provider
Open Linux-based OS
Fully automated testing,
and SW development
HPC on Wall Street - 2012
3. Arista Application Switch - 7124FX
• Couples ultra-low latency switch with next
generation programmable FPGA and memory
subsystem
• Customer programmable FPGA and Control Plane
provides total control over the network, forwarding,
inspection, redirection, etc.
• Targeted for early adopters of hardware
HPC on Wall Street - 2012
4. Exegy believes…
• Exegy believes in continually challenging the status quo of
market data delivery systems and trading platforms.
– First to market with hardware-accelerated market data appliances
based on FPGA technology.
– Best of breed solutions for major use cases faced by low-latency,
high-capacity consumers of financial market data feeds.
• Exegy believes that delivery and consumption of quality
market data should be as easy and painless as possible.
– Fully managed and constantly monitored appliances to assure
optimal performance and the best customer experience.
– A passion to help our customers succeed in the face of escalating
complexity and the increasing demands placed on them.
v1 4
5. Impulse C, Custom FPGA-Accelerated Solutions for the Arista
7124FX
Brian Durwood, Co-founder
Converting C to multiple streaming hardware
processes ain’t that hard.
Focus on reducing clock cycles
Verify as you go
Iterate, iterate, iterate (no “magic button”)
The tool flow is a bit awkward for first timers.
Visual Studio or equivalent
Impulse C co-development, analysis & compile
Altera Quartus II for place & route into FPGA
Things you can do to get up to speed quickly:
Work from known good sw modules
Get up-front training or factory engineering
6. Programming With Impulse C
Not a new language
C language
Based on standard ANSI C applications
C-language for FPGA programming
For embedded and HPC applications Generate Generate Generate
accelerator hardware software
Supports standard C development tools hardware interfaces interfaces
Supports multi-process partitioning
HDL C software
A software-to-hardware compiler files libraries
Optimizes C code for parallelism
Generates HDL, ready for FPGA synthesis
Arista’s
Also generates hardware/software interfaces on-board
FGPA
Purpose
Describe hardware accelerators using C
Move compute-intensive functions to FPGAs
www.ImpulseAccelerated.com
8. Custom FPGA-Accelerated
Solutions for the Arista 7124FX
Brian Durwood, Co-founder
Converting C to Multiple Streaming Hardware Processes
9. FPGAs – Advantages Over Software
Massive parallelism
At system level, loop level, instruction level
One FPGA can replace multiple CPUs
For specific tasks/algorithms, using much lower power
No need for separate NIC card
Enable in line processing at near line speed
Minimize OS interference in filtering
Especially during high transaction load events
Reduces jitter and other interference
Offloads standard CPUs with customized pre-processors
e.g. select limited analysis of X message types that meet X
criteria for X symbols
www.ImpulseAccelerated.com Confidential 9
10. 3 Popular FPGA Configurations
Usage Usage
Option Embedded
CPU
1 2 Core
Generated
Generated
Hardware
hardware
Generated
Embedded
module
Module Hardware
hardware
Accelerators
accelerator
FPGA
FPGA
Create a hardware module
Accelerate an embedded CPU
Usage
3 Accelerate an
Generated
external/host CPU Generated
hardware
Generated
hardware
or computing Generated
accelerator
hardware
accelerator
cluster hardware
accelerator
accelerator
Host
processor
or cluster FPGA coprocessor
10
11. Configurations Can Be Combined
Combining streaming, embedded processor, and host processor
Stream FPGA
10G Ethernet
processing Embedded
Matching
and CPU
algorithm
parsing for
and strategy
configuration
Host
message Embedded and shared RAM
generation
FPGA
FPGA strategies can be coded using
C for hardware and for embedded
CPU, with shared RAM for hash table
lookup or other local data
www.ImpulseAccelerated.com
12. Impulse C Programming Model
H/W process S/W process
S/W process H/W process
H/W process
Communicating C-Language Processes
Supports dataflow and message-based communications
Supports parallelism at the application level and at the level of
individual processes
Allows simulation and
debugging of parallel
software processes.
www.ImpulseAccelerated.com 12
13. Parallelism via Multiple Processes
Spatial
parallelism
C
Temporal
parallelism
(system-level pipelining)
www.ImpulseAccelerated.com 13
14. An Impulse C Process
Multiple methods of
process-to-process
Shared memory communications
C block reads/writes are supported
Stream
inputs C Stream
outputs C
C Signal
inputs
process Signal
outputs
Register Register
inputs outputs
App Monitor
outputs
C
Processes are independently
synchronized
www.ImpulseAccelerated.com 14
15. Compile and Optimize
Optimize the results using
interactive tools
Pipeline analysis
Loop unrolling
Instruction scheduling
Generate FPGA hardware
VHDL or Verilog
Low level interfaces to
memory, I/O and
busses.
ModelSim Test bench
www.ImpulseAccelerated.com 15
16. Debug and Verify
Use C tools for application
debugging
Source-level debuggers
C-language testing
Test and analyze parallel
dataflow with the Impulse
Application Monitor
Automatically generate
VHDL or Verilog Test-
benches
www.ImpulseAccelerated.com 16
17. Constructs Familiar to C Programmers
Concept is similar to getc(), putc() in C for I/O
co_stream_create Used in configuration
co_stream_open Open the stream (clear eos)
co_stream_close Close the stream (set eos)
co_stream_eos Check end of stream (eos)
co_stream_read Read from stream (with rdy, en)
co_stream_write Write to stream (with rdy, en)
co_stream_read_nb Non-blocking read (no rdy)
co_stream_write_nb No-blocking write (no rdy)
www.ImpulseC.com 17
18. Credible Solution in use by:
Multiple Confidential
Financial
NDA Covered
Financial Teams
www.ImpulseAccelerated.com Confidential 18
19. Impulse Platform Support Package
FPGA
Embedded
Processor Memory
Resources
FPGA Host Interfaces
Impulse Produces Fabric Processing
CoDeveloper™ Core
PSP generates HW/SW Ethernet
wrappers between FPGA
core & system elements
Other I/O
Extensions (scripts and wrapper generators)
Platform-specific library functions
Documentation and tutorials
Current ready to run examples for platform
www.ImpulseAccelerated.com Confidential 19
20. Examples of FPGA processing:
Financial feed kernel bypass or Full
Hardware based trading
Direct handling of financial feeds
Parsing incoming feeds and triggering
outbound orders – your strategy in
hardware
Normalization or Protocol Conversion
Gateway sending a sub-feed of data
Pre-Trade Risk Checking
Low Latency Broker Dealer Compliance
Financial valuations
Co-processor off-loading for Monte Carlo
and other algorithms
www.ImpulseAccelerated.com Confidential 20
21. Stand-Alone Feed Handling Solution
Usage
3
RX
Adapter
(Verilog)
Feed Handler
1G or 10G
and
Ethernet
Outbound UDP
MAC
(Impulse C)
TX
Adapter
(Verilog)
www.ImpulseAccelerated.com Confidential 21
22. Network Processing Pipeline
FPGA UDP and TCP/IP
implemented
directly in FPGA
hardware for low
Enet UDP Parser
1/10GigE MAC latency
Filter and/or TCP/IP
Stack
Host System
Custom
Embedded Filtering User
CPU Application Applica-
tion
Driver
Host
I/O Interface Host
Memory
www.ImpulseAccelerated.com Confidential 22
23. Complex Order Support
Standard
Standard FPGA or FPGA-Based Board
and
Exchanges, feed handlers, order data sources
and
Exchanges, feed handlers, order data sources
CustomIncoming Outgoing
Custom
Feed
Feed
Direct connection Impulse UDP/TCP
Direct connection Impulse UDP/TCP
Normalizing Across Feeds
Handler
Handler Replace NIC
Formats
Formats
e.g.: ITCH, Sub-Feed
Produce
e.g.: ITCH, Revert feed to exchange formats
OUCH,
OUCH,
Pull and Present Opportunities
OPRA,
OPRA, 10 Gb/S
Hardwire potential X required responses
Decompression
Ethernet
BATS, &
DecryptionBATS, &
Generic
Generic
Replace UDP.
UDP.
NIC Apply Trade Logic Message Management With Exchanges
Adapters
Adapters
Processing without OS Insert risk limitations awaiting confirm
RMDS,
RMDS,
Bloomberg
Bloomberg
Ultra-fast pattern matching
and
and Manage Risk
Custom.
Custom.
www.ImpulseAccelerated.com Confidential 23
24. Three Ways To Get Started
Learn the tools
Acquire an Impulse CoDeveloper license.
Work from the included reference designs.
Experiment with ways to optimize your algorithms to run efficiently as
multiple streaming processes in FPGA.
Turn Key System (“Bump in the Wire”)
License above +
UDP or other network attached FPGA-enabled reference design.
FPGA-based accelerator platform.
Impulse factory engineers to help get your system on line.
Turn Key System Running A Target Algorithm
License above + Turn Key System above +
Impulse Engineers, under NDA, refactor your target algorithm(s) for
efficient compilation to FPGA.
Impulse Engineers train your team on how the refactoring works.
www.ImpulseAccelerated.com Confidential 24
25. About Impulse
Most widely used C to FGPA tool
Pure ANSI C
No PAR or HW statements inserted
Founded in 2002
By part of the original ABEL team
www.ImpulseAccelerated.com Confidential 25
26. Additional Resources
Engineering consultation
info@ImpulseAccelerated.com
Tutorials:
www.ImpulseAccelerated.com/Tutorials
Book:
Practical FPGAProgramming in C
www.ImpulseAccelerated.com 26
27. Arista Application Switch – Systems Design
Compute, Storage, Memory, I/O, Application Acceleration –
Together
HPC on Wall Street - 2012
28. Platform Details
Console Port
Air Vents Clock Input
16 Base SFP/SFP+ Ports 8 FX SFP/SFP+ Ports USB Port
Management Port
24 Wirespeed 1G/ 10G SFP/ SFP + Ports
High Availability:
Dual Hot-swappable Power
Supplies
Multiple Hot-swappable Fan Units
Designed for Data Center + Colocation:
Flexible Front-to-Rear or Rear-to-
Front Airflow
Choice of AC or DC Power
Application Switching for Cloud
Supplies HPC on Wall Street - 2012
Networks
29. Arista Application Switch - 7124FX
Ultra Low Latency 24 port 10GbE Switch
•16 10GbE ports connected to LLE ASIC
•8 10GbE ports connected through Stratix V FPGA
•Built in 50GB SSD
•Optional Chip-Scale Atomic Clock and External Clock
Source
HPC on Wall Street - 2012
31. Financial Services Applications
Inline Risk Analysis Low Latency Broker Dealer Compliance
Offload line arbitration to dramatically
Feed Handling and A/B Arbitration improve application performance
Instrument transaction performance at high
Real-time Data analysis resolution
Reducing system latency increases
Algorithmic trading performance of trading strategies
Convert or normalize multiple order entry
Order Protocol Conversion formats to a common format
Order Execution Routing Set order policies for best execution
Application Switching for Cloud
Networks HPC on Wall Street - 2012
March 19, 2011
33. Application Switch Development Partners
Complete integrated appliance model
• Novasparks 100% Hardware market data solution
• Exegy Appliance based robust ticker plant
System integrators and development support
• Impulse C C to RTL tools
• Enyx Customer trading solutions and IP blocks
HPC on Wall Street - 2012
34. Arista Application Switch 7124FX
A new category of product that provides a network
accelerated platform for high performance app
vendors to develop on
Combines a true network switch with full routing and
switching protocols, with fully-programmable hardware
creates a new market for the most demanding
applications
Application logic inserted into real-time environments
HPC on Wall Street - 2012
with complete transparency