Top 10 Most Downloaded Games on Play Store in 2024
Techno Arms Dealers and High Frequency Traders
1. Techno Arms Dealers & High
Frequency Traders
Today represents the hottest time to be in financial markets – nanosecond response
times, the ability to affect global markets in real time, and lucrative spot deals in dark
pools being all the rage. For companies who do business in these times, it is a technical
arms race, worthy of a Reagan era analogy.
With High-Frequency Trading firms locked into an effective “Space Race”, the
challenges for these firms are now far reaching, extending beyond traditional regulatory,
compliance, and government boundaries.
With a need to ensure that regulatory requirements are met, serious fines for non
compliance and even enforceable undertakings by 3rd parties to halt trading activities on
markets are still outweighed by the potential upside for combatant firms playing in the
race.
Increasingly, the most marginal of technical errors can spell doom for market
participants. In a market where risk is a prime occurrence and measured often
in millions of dollars, glitches are a regular occurrence, resulting in lost revenue,
disappointed customers, and the fast destruction of once high-profile market leaders.
2. Recently, this was brought to the public’s awareness, with the spectacular failure of
Knight Capital: in August of 2012, erroneous trades were sent to the New York Stock
Exchange, leading to the obliteration of nearly 60% of the firms value in under 1 hour.
The firm’s catastrophe has forced an attitude change among investors and corporate
technology leadership, with a focus on compliance controls and board level
accountability. Tiny lapses in controls are expensive mistakes, leading to the disruption
of markets, in conjunction with the immense losses and liability suits that often trail such
events, the stakes are higher than ever to develop software in a controlled way and get it
to market in the shortest time possible.
With regulatory changes imminent, the need for clearer, actionable reporting at all levels
of technology organizations require a clearer approach than the traditional ones taken in
the past.
The Landscape of Failure:
In the last 2 years alone, there have been numerous incidents of technology
misconfiguration that led markets awry. Institutional investors aside, the mechanisms
that govern software development for brokerage firms and markets have far-reaching
and damaging consequences. From ill-prepared recovery protocols to poorly governed
front, back, and middle offices; there are several noteworthy incidents in recent times
that have led to greater scrutiny for trading companies.
November 2012 – NYSE/Eurodex
A newly implemented market matching engine UTP (Universal Trading Platform, the
core trading platform employed by the NYSE) caused a day-long disruption and forced
the Big Board operator to establish closing prices for more than 200 stocks using a
fallback to it’s old system Super Display Book (sDBK). Trading never resumed during the
day for the 216 stocks affected, and the exchange determined the official closing price
for each of the affected securities based on a consolidated reading of last-sale prices,
instead of an auction system used to close stocks, manual intervention was required to
revalidate positions for venues and participants.
3. Overview of the Root Cause: Poor Testing/Quality Assurance/Release Management
Failure.
2007 – 2010 London Stock Exchange (LSE) Multiple Outages & the Move to Linux
Over the course of a 4 year period, the London Stock Exchange began to earn a
reputation as the most unreliable exchange in the market. Multiple outages and multiple
technology problems all led to a raft of technology errors, which were manifested in
regular outages. In fact, the LSE had to ultimately change it’s entire operating stack to a
new platform and institute a raft of new mature processes to achieve the kind of
reliability they needed.
LSE Migration to Linux
August 2012 Knight Capital:
In the span of 45 minutes, a little over four hundred million dollars was lost when an
algorithmic trading program designed for testing environments was released to their
production environment market. The blunder led to a seventy five percent dip in the
stock price in a 30 minute period before attempts to salvage the situation could be
initiated. The error entailed HFT (High-Frequency Trading) of up to 140 stocks, and is
just the latest in a string of such errors.
The Root Cause: Poor Configuration Management, Inconsistent Testing Approach, Poor
Release Management
But How?
Most brokerages apply several layers of risk mitigation when developing and deploying
software. I’ll give a high level overview (below) of a traditional approach in another post
(I won’t go into the details of settlement, vetting, market matching etc). Trading firms are
complex beasts, with multiple market participants, multiple exchanges and a plethora of
investment instruments to use, and going into detail on the actual technologies detracts
from the message. What is apparent is that the process life-cycles, which are used to
achieve releases, are governed by mechanisms from a different time and place, with
4. varying inconsistent controls not designed for rapid release schedules, leaving gaps in
organizational capabilities that are open to failure.
The “New Old Ways” to Manage These Problems:
Typically Application Lifecycle Management (ALM), a recent play, is a means of ensuring
that software remains relevant. A vital aspect of the Software Development Life-Cycle
(SDLC), ALM is an integral part of ensuring that the firms overcome challenges to
developing top-notch software at a fast pace. The new wave premise of ALM, follows a
design, build, run mentality, and pushes the paradigm to encompass all activities in the
development cycle under one roof, whereas previous approaches followed often
different approaches with best-of-breed solutions.
The benefits of this, with regard to trading systems, are clear. Greater visibility and
consistency between tools implies more fixes to bugs, and ultimately fewer glitches. The
unfortunate reality is that underlying configurations are not still maintained well in this
approach, and unfortunately would not have been necessarily caught with traditional
ALM technology vendors.
ITIL is a widely accepted approach to IT service management in these organizations. An
ITIL enabled process centrally focusses on what is called a Configuration Management
Database (CMDB); which contains all information pertaining to an information system. It
helps the organization identify and comprehend the relationship between system level
components and applications, and it is designed to track relationships between
technology services and at a micro level, items called CI’s (Configuration Items). This
process is known as configuration management, but as this typically lives in the
operational part of the equation (Application Support, Infrastructure Operations &
Service Management), the process usually only gets invoked at a high level in the pre-
production environments. There is another discipline called Software Configuration
Management which has applicable components in ITIL and ALM, however the tools and
processes rarely meet, as the distinction between the disciplines are very much either
software or infrastructure orientated.
The conceptual CMDB enables controlling and specification of configuration items in a
systematic and detailed manner, reducing configuration drift. As mentioned previously,
5. problems with this approach manifest in the ITIL world, as the CMDB typically does not
converge with the version control repositories in the development life-cycle, and more
often than not are actually not version controlled themselves – leaving further
inconsistencies.
Okay Okay We Get That, So What Went Wrong at Knight?
Basically, Knight accidentally released simulation software they used to verify their
market-making software functioned properly, into NYSE’s live system.
Within Knight Capital’s development environments lived a software program called “a
market simulator”, designed to send spread patterns of buy and sell orders to its
counterpart market matching software, called RLP in this case. The trade executions are
recorded and were potentially used for performance validation prior to new releases of
the market matching software. This is probably how they could stress test how well their
new market-making software worked under load before deploying to the live system
connected to the NYSE live system.
Prior to August the 1st, a number of teams progressively would have migrated software
between environments for release into the “live environment”. Potentially, a manual
process was caught in the deployment, and pushed a copy of the simulation software
into the “live”. As you can see, most companies do not employ baseline configuration
tests in the later environment stages, thus (probably at a later stage in the process),
someone opted to add the program to the release package and deployed it.
This is exacerbated in large teams, and is simply an overhang of the fact that typically no
one team owns the configuration state, of both the Applications & the Operating
Systems/Platform that they run on, the closest team is usually the systems
administration team, but as they have a production environment to manage, these
“lesser” environments get sidelined with more important problems to deal with.
Combined with the fact that there are very few tools that actually focus on the
configuration testing aspects and people use collections of scripts or home-brew
solutions, it is easy to see where this went wrong.
6. The lack of a well-defined configuration baseline and set of configuration tests including
differences between the environments is the likely cause (well, from an outsider’s
perspective) of the problem.
On the morning of August 1st, the release was successfully deployed and the simulator
inadvertently bundled with the release was ready to do its job: execute market-making
software.
This time however, it was no longer in one of the test environments, it was actually
executing live trades on the market, with real orders and real dollars.
For stocks where Knight was the only one running market-making software as a RLP,
and the simulator was the only algo trading that crossed the bid/ask spread, then we
saw consistent buy and sell patterns of trade executions, all marked regular, all from the
NYSE, and all occurring at prices just above the bid or just below the ask.
Examples include EXC and NOK, and you can see these patterns in charts here. The
simulator was functioning just as it did in the test environments, and Knight’s market
making software was intercepting these orders and executing them. Knight’s net loss is
minor on simple volumes, on this day however, the problem was compounded, as the
software was operating , but they were generating a lot of wash sales.
For stocks where Knight was not the only market-maker, or when there was other
algorithmic trading software actively trading (and crossing the bid/ask spread), then
some, or all of the orders sent by the simulator were executed by someone other than
Knight, and Knight now had a position in the stock. Meaning it could have been making
or losing money. The patterns generated for these stocks depended greatly on the
activity of the other players.
Because the simulator was buying indiscriminately at the asking and selling of the bid,
and because the bid/ask spreads were very wide during the open, we now understand
why many stocks moved violently at that time. The simulator was simply hitting the bid or
offer, and the side it hit first determined whether the stock opened sharply up or down.
7. Since the simulator didn’t think it was dealing with real dollars, it didn’t have to keep
track of its net position. Its job was to send buy and sell orders in waves across pre-
defined positions.
This explains why Knight didn’t know right away that it was losing a lot of money.
They didn’t even know the simulator was running.
When they realized they had a problem, the first likely suspect was likely the new
market-making software. We think the two periods of time when there was a sudden
drop in trading (9:48 and 9:52 AM), are when they restarted the system. Once it came
back, the simulator, being part of the package, fired up and continued trading positions.
Finally, just moments before a news release at 10 AM, someone found and killed the
simulator.
We can fully appreciate the nightmare their team must have experienced that morning, a
lack of visibility, inconsistent sources of what was actually running in production, and
poor visibility over the successful release.
Regulated Controls Against Flash Crashes
Like those that came before it, Knight Capital was once THE retail market-maker in the
US; its reputation has now been irreparably damaged. It’s prudent to note that the error
was vastly avoidable, had the relevant controls been put in place.
Several factors played into this scenario, namely:
- Poor configuration management,
- A set loose controls around the release management process within the firm,
- A lack of visibility into the makeup of the changes that were being introduced into the
market.
- An inability to isolate the configurations that we deployed
- A lack of configuration testing
8. - A lack of operational acceptance testing
Automated Governance is the Way Forward
DevOps, a recent answer to the challenges of collaboration across release cycle,
stresses the seamless integration of software development and collaboration between IT
teams, with a view towards enabling a rapid rollout of products via automated release
mechanisms. It recognizes the existing gap between activities considered as part of
development life-cycle, and those characterized as operational activities. Historically, the
separation of development and operations has manifested itself as a form of conflict, as
can be clearly seen by the sheer amount of frameworks developed to address the
problem, which ultimately predisposes entire systems to errors.
What’s currently lacking in each approach is a mechanism to gather systems knowledge
in environments where skills and capabilities between teams varies significantly.
For orchestration and deployment Puppet, Chef, Bladelogic and Electric Cloud go a long
way towards improving upon the existing configuration components of ALM models, but
often neglect the interaction with ITIL. Puppet has been making strides in recent months
with integrations into tools of this nature. Yet, the existing suites of tools require specific
knowledge of declarative domain-specific languages to enable a user to describe system
resources and their state. In the case of Puppet, discovery of system information and
compilation into a usable format is possible, but is a daunting task to a novice user in
these fast paced corporate environments.
Over time, heavily regulated environments, governed by strict auditing requirements,
combined with a validation mechanism that can clearly be maintained and usable by
then varying capability levels of an organization must be put in place to ensure that
configuration drift between environments is caught early and reported back.
Increasingly smart automations will be deployed, which will ensure state is forcefully
maintained by testing, recording, and auto-provisioned safely. This is a unique means of
peer-based systems configuration and a measure of prevention before configuration
errors affect running systems,that very few companies are experimenting with (aka
configuration-aware systems).
9. Our own tool, ScriptRock, complements the existing workflow tools and offers the
simplest way for Developers/Configuration Managers & Systems Administrators gain
realtime validation of configuration state to great effect. It enables the creation and
running of configuration tests, collaborative configurations for teams, and a robust
community option in coming months, as well as the creation of detailed documents that
act as reports to satisfy audit standards. Applying ScriptRock to these environments
ensures fast process maturity for developing seamless system configurations and
requires no new syntax introductions or code; everything is available as a version
controlled test that can be executed under strict security contexts on the target system.
Governing Bodies
The Knight crisis is not an isolated event. However, it has been looked at as a rallying
call for greater visibility into the processes and compliance measures implemented
within trading participants.
With the increasing complexity of trading algorithms, which are the backbone of trading
procedures, the necessity of controls to govern these technology organizations is
becoming more apparent each day.
Mary Shapiro, the outgoing Chair of the SEC, called for a review of the SEC’s
automation review policies, which were put in place with exchanges after the 1987
market crash, that require venues to notify the regulator of trading failures or security
lapses. Portions of those policies will serve as the basis for the new rules.
The implementation of a powerful trading platform rests on many pillars. Their
remarkable effectiveness has led to the reliance of historically legacy solutions to deal
with the rapid release schedules that firms now face to stay recognized as leading
systems. This comes at a cost, as this increased pressure to deliver innovation has
opened up these systems, and more importantly, the processes and tools that govern
them to exposure and the risk of failure if glitches occur. As a result, the concepts
outlined in DevOps are clearly necessitated in order to continue delivering key features
and key components of financial markets, proper execution will help avert crises such as
the Knight fiasco in future.
10. The comprehension and adoption of the various frameworks, the integration of IT
Automation, and clear governance of development and operational environments will go
a long way into ensuring that a fiasco such as the Knight crisis remains solely as a
problem of the past, never to be replicated. Unfortunately, we still have a long way to go
in this journey.