SlideShare uma empresa Scribd logo
1 de 18
Getting the most out of multi-year
and multi-source trading history
Glenn Wright, EMEA Systems Architect DDN
June 2014
© 2013 DataDirect Networks, Inc.
ddn.com
Agenda
Uh? Who is DDN?
The Evolution of Data in Data Handling Market Systems
The Big Analytics Crunch
What’s hot, what not…. It’s Parallel Performance, stupid!
© 2013 DataDirect Networks, Inc.
ddn.com
DDN | The “Big” In Big Data
800%
Paypal accelerates
stream processing
and fraud analytics
by 8x with DDN,
saves $100Ms.
1TB/s
The world’s fastest
file system, to power
the US’s fastest
supercomputer, is
powered by DDN.
Tier 1
Tier1 CDN accelerates
the world’s video traffic
using DDN technology
to exceed customer
SLAs.
3
© 2013 DataDirect Networks, Inc.
ddn.com
DDN | The Technology Behind The World’s Leading
Data-Driven Organizations
HPC &
Big Data Analysis
Cloud &
Web Infrastructure
Professional
Media
Security
© 2013 DataDirect Networks, Inc.
ddn.com
Big Data & Cloud Infrastructure
DDN’s Award-Winning Product Portfolio
Analytics
Reference
Architectures
EXAScaler™
10Ks of Clients
1TB/s+, HSM
Linux HPC Clients
NFS & CIFS [2014]
Petascale
Lustre® Storag
e
Enterprise
Scale-Out File
Storage
GRIDScaler™
~10K Clients
1TB/s+, HSM
Linux/Windows HPC Clients
NFS & CIFS
SFA™12KX
48GB/s, 1.7M IOPS
1,680 Drives in 2 Racks
Optional Embedded Computing
SFA7700
12.5GB/s, 450K IOPS
60 Drives in 4U
228 Drives in 12U
Storage Fusion Architecture™ Core Storage Platforms
SATA SSD
Flexible Drive Configuration
SAS
SFX™ Automated Flash Caching
WOS® 3.0
32 Trillion Unique Objects
Geo-Replicated Cloud Storage
256 Million Objects/Second
Self-Healing Cloud
Parallel Boolean Search
Cloud Foundation
Big Data Platform
Management
DirectMon
™
Cloud
Tiering
Infinite Memory Engine™ [Tech Preview]
Distributed File System Buffer Cache
WOS7000
60 Drives in 4U
Self-Contained Servers
Adaptive Transparent Flash Cache
SFX API Gives Users Control
[pre-staging, alignment, by-pass]
© 2013 DataDirect Networks, Inc.
ddn.com
0.0
20,000,000.0
40,000,000.0
60,000,000.0
80,000,000.0
100,000,000.0
120,000,000.0
140,000,000.0
160,000,000.0
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
TOTAL
Americas
Asia - Pacific
Evolution of Market Systems
SOURCE: World Federation of Exchanges 2011 Annual Report and Statistics
DASD
F DASD
Scale-out NAS
Parallel File System
© 2013 DataDirect Networks, Inc.
ddn.com
UNDERLYING ISSUE:
Gaping Performance Bottlenecks
• Moore’s Law has out-stripped improvements to
disk drive technology by two orders of
magnitude during the last decade
• Analytics moved to HPC clusters
• Today’s servers are hopelessly unbalanced
between the CPUs need for data and the
HDDs ability to keep up
HDD vs. CPU Relative Performance Improvement
2005 2006 2007 2008 2009 2010 2011 2012 2013 2014
20,000 x
1gb
16gb
© 2013 DataDirect Networks, Inc.
ddn.com
Welcome to the Big Analytics Crunch
• 500TB to > 2PB of historical data for one TZ
• Distributed cache : online model reads data at 100s of GB/s IO
(Tick DB application such as kdb+)
• 3D “cube” of in memory distributed data, online, realtime
• 100’s of services/servers working together in memory: low
latency analytics w/ simplicity of persistent File system semantics
• Burst buffer low latency operation mainstream in FSI
► Real time Back testing
► Real time intra-day risk positioning
© 2013 DataDirect Networks, Inc.
ddn.com
Why DDN & Why Parallel ?
In Production
Many systems deployed W/W
@ Global Investment Banks and Hedge Funds
Performance and Consolidation
Back test in a few seconds is much closer to the trade event
Mix online history and real time trade analytics
Consolidate in-memory databases against on copy of data
At Scale
Flash – is NOT scale @ capacity
Single namespace, history and real-time
© 2013 DataDirect Networks, Inc.
ddn.com
Limitless Scale up and Scale out with kdb+…
Compute Fabric
KDB+ (1) KDB+ (2) KDB+ (3) KDB+ (16)
MDS
Primary
MDS
Replica
OSS1
MDT DDN
SFA7700
DDN
SFA7700
OSS2 OSS3 OSS4
© 2013 DataDirect Networks, Inc.
ddn.com
What we changed:
export SLAVECOUNT=160 # number of kdb+ client tasks
Export CLIENTCOUNT=10 # number of processes per kdb server
Q script Query:
l beforeeach.q
R1S:rrdextras flip`k`v!(" S*";",")0:`:rrd.csv
/ year-hibid
outp t:”YRHIBID";
fn:{[f;s;d] flip`date`sym`a!flip raze(f each s)peach d};
NRS:.tasks.rxsg[H;`$t;1;(fn[hb];apickAs[R1S;`Symbol];reverse ALLDATES2011)];
l aftereach.q
symbols:
$glenn head rrd.csv
1,Symbol,LKQQ
1,Symbol,LHDE
1,Symbol,LNJO
1,Symbol,LLTR
1,Symbol,LRFC
1,Symbol,LQGA
1,Symbol,LTNQ
1,Symbol,LSAG
1,Symbol,LQIA
1,Symbol,LKSJ
… x850 symbols vs 84
© 2013 DataDirect Networks, Inc.
ddn.com
glenn$ more hostport.txt
127.0.0.1:5000
127.0.0.1:5001
127.0.0.1:5002
127.0.0.1:5003
127.0.0.1:5004
127.0.0.1:5005
127.0.0.1:5006
127.0.0.1:5007
127.0.0.1:5008
127.0.0.1:5009
What we changed (2):
# replace $QEXEC initdb.k -g 1 -p $((baseport+i)) </dev/null &>log$((baseport+i)).log&
for i in `seq 20000 20009`
do
for j in `seq 0 15`
do
echo ssh server-$j "cd $HOME;QHOME=/home/glenn/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &"
ssh gp-2-$j "cd $HOME;QHOME=/home/mpiuser/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &"
while ! nc -z "gp-2-$j" $i; do sleep 0.1; done
done
done
# get ready ??
echo `date -u` $SLAVECOUNT slave tasks started
# then start the servers aimed at the slaves
baseport=5000
for ((i=0; i<$CLIENTCOUNT; i++));
do
$QEXEC initdb.k -g 1 -s -$SLAVECOUNT -p $((baseport+i)) </dev/null &>log$((baseport+i)).log&
while ! nc -z localhost $((baseport+i)); do sleep 0.1; done
Done
# check that everything can startup : $QEXEC startdb.q -s -$SLAVECOUNT -q
© 2013 DataDirect Networks, Inc.
ddn.com
What we changed (3):
Startdb.q …
…
/ check all servers are there
/{hopen(x;500)}each("I"$getenv`BASEPORT)+til"I"$getenv`SLAVECOUNT;
{hopen(x;2500)}each hsym`$read0`:slavehostport.txt;
l initdb.k
{hopen(x;500)}each 5000+til"I"$getenv`CLIENTCOUNT;

Cat slavehostport.txt:
192.168.3.51:20000
192.168.3.51:20001
192.168.3.51:20002
192.168.3.51:20003
192.168.3.51:20004
192.168.3.51:20005
192.168.3.51:20006
192.168.3.51:20007
192.168.3.51:20008
192.168.3.51:20009
192.168.3.52:20000
192.168.3.52:20001
192.168.3.52:20002
…. 160 times
© 2013 DataDirect Networks, Inc.
ddn.com
Slave (1) slave (2) Slave (3) Slave n
Lustre/DDN Service
/mnt/onefilesystem
Q clients:
Slave x10
Slave x10
Slave x10 Slave x10
Up to 1TB/sec… “n” way server striping or by date/sym
© 2013 DataDirect Networks, Inc.
ddn.com
Results of Scaling the service ….
0
50
100
150
200
250
Single Thread Lustre
Latency reduction (number of
seconds for query) *Lower is better
The Parallel FS solution shows a
near linear scalability model for
one instance running over many
nodes, as measured from kdb+.
Latency is the time to wait from
the kdb+ query of 245GB of data.
To put this in context, these nodes
were only equipped with 64GB of
memory.
© 2013 DataDirect Networks, Inc.
ddn.com
Some of the many Benefits of kdb+ on Parallel FS
1. Significant decrease in operational latency per kdb+ query, especially when running queries that search
through significant amounts of historical market information. Achieved by balancing content around
multiple file system servers
2. Parallelization of kdb+ query “threads” in a single shared namespace, allowing a user to treat any data
workload independently from other data workloads. “query from hell” on production system is now OK?”
3. Simultaneous read/write operations on a single namespace for the entre database and for any number
of kdb+ clients, (e.g. end of day data consolidations into a hdb instance)
4. Sharing of data amongst different independent hdb/rdb instances. Many instances of kdb can view the
same data, meaning that strategies for data sharing and private data segments may be
consolidated onto the same space. Avoids the need for kdb+ admins to physically copy data around
the network or disks
5. Kdb+ context can be “striped” around all FS servers, or can be allocated in a round robin fashion against
each server. Striping allows the opportunity for some files to attain maximal I/O rates for a single kdb+
“object”.
© 2013 DataDirect Networks, Inc.
ddn.com
Next Steps?
Thanks!
gwright@ddn.com
www.ddn.com
/Big Data
Glenn Wright, EMEA Systems Architect DDNJune 2014

Mais conteúdo relacionado

Mais procurados

Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehousePrecisely
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianData Con LA
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudlohitvijayarenu
 
Overcoming Barriers of Scaling Your Database
Overcoming Barriers of Scaling Your DatabaseOvercoming Barriers of Scaling Your Database
Overcoming Barriers of Scaling Your DatabaseScyllaDB
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidCharles Allen
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes Minio
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...StampedeCon
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloudlohitvijayarenu
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Databricks
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9Gleb Otochkin
 
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsAerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsBrillix
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Charles Allen
 
Druid realtime indexing
Druid realtime indexingDruid realtime indexing
Druid realtime indexingSeoeun Park
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)Ontico
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBMongoDB
 
Scaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterScaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterlohitvijayarenu
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterImply
 

Mais procurados (20)

Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data WarehouseReal-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
Real-Time Streaming: Move IMS Data to Your Cloud Data Warehouse
 
A Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen DonigianA Gentle Introduction to GPU Computing by Armen Donigian
A Gentle Introduction to GPU Computing by Armen Donigian
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
 
Overcoming Barriers of Scaling Your Database
Overcoming Barriers of Scaling Your DatabaseOvercoming Barriers of Scaling Your Database
Overcoming Barriers of Scaling Your Database
 
Programmatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & DruidProgrammatic Bidding Data Streams & Druid
Programmatic Bidding Data Streams & Druid
 
Jee conf
Jee confJee conf
Jee conf
 
Building modern data lakes
Building modern data lakes Building modern data lakes
Building modern data lakes
 
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
Analytics, Big Data and Nonvolatile Memory Architectures – Why you Should Car...
 
Managing 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in CloudManaging 100s of PetaBytes of data in Cloud
Managing 100s of PetaBytes of data in Cloud
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
Building a Versatile Analytics Pipeline on Top of Apache Spark with Mikhail C...
 
There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9There and back_again_oracle_and_big_data_16x9
There and back_again_oracle_and_big_data_16x9
 
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time ApplicationsAerospike: The Enterprise Class NoSQL Database for Real-Time Applications
Aerospike: The Enterprise Class NoSQL Database for Real-Time Applications
 
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
Data Analytics and Processing at Snap - Druid Meetup LA - September 2018
 
Druid realtime indexing
Druid realtime indexingDruid realtime indexing
Druid realtime indexing
 
Scaling HDFS at Xiaomi
Scaling HDFS at XiaomiScaling HDFS at Xiaomi
Scaling HDFS at Xiaomi
 
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
The Future of Postgres Sharding / Bruce Momjian (PostgreSQL)
 
Webinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDBWebinar: Faster Big Data Analytics with MongoDB
Webinar: Faster Big Data Analytics with MongoDB
 
Scaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterScaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitter
 
Analytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at TwitterAnalytics over Terabytes of Data at Twitter
Analytics over Terabytes of Data at Twitter
 

Semelhante a AquaQ Analytics Kx Event - Data Direct Networks Presentation

Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Gary Arora
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesAmazon Web Services
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...DataWorks Summit
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dwelephantscale
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalAvere Systems
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDBDenny Lee
 
DDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for ExascaleDDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for ExascaleIntel IT Center
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoAmazon Web Services LATAM
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon RedshiftAmazon Web Services
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...inside-BigData.com
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Amazon Web Services
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopDatabricks
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기Amazon Web Services Korea
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the CloudKellyn Pot'Vin-Gorman
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAmazon Web Services
 

Semelhante a AquaQ Analytics Kx Event - Data Direct Networks Presentation (20)

Getting Started with Amazon Redshift
 Getting Started with Amazon Redshift Getting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
Leapfrog into Serverless - a Deloitte-Amtrak Case Study | Serverless Confere...
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar SeriesGetting Started with Amazon Redshift - AWS July 2016 Webinar Series
Getting Started with Amazon Redshift - AWS July 2016 Webinar Series
 
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
Disaster Recovery Experience at CACIB: Hardening Hadoop for Critical Financia...
 
Changing the game with cloud dw
Changing the game with cloud dwChanging the game with cloud dw
Changing the game with cloud dw
 
Galaxy Big Data with MariaDB
Galaxy Big Data with MariaDBGalaxy Big Data with MariaDB
Galaxy Big Data with MariaDB
 
Solving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute finalSolving enterprise challenges through scale out storage &amp; big compute final
Solving enterprise challenges through scale out storage &amp; big compute final
 
Introduction to Azure DocumentDB
Introduction to Azure DocumentDBIntroduction to Azure DocumentDB
Introduction to Azure DocumentDB
 
DDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for ExascaleDDN and Intel: Partnered for Exascale
DDN and Intel: Partnered for Exascale
 
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analíticoImmersion Day - Como simplificar o acesso ao seu ambiente analítico
Immersion Day - Como simplificar o acesso ao seu ambiente analítico
 
Getting Started with Amazon Redshift
Getting Started with Amazon RedshiftGetting Started with Amazon Redshift
Getting Started with Amazon Redshift
 
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
DDN: Massively-Scalable Platforms and Solutions Engineered for the Big Data a...
 
Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”Building Analytic Apps for SaaS: “Analytics as a Service”
Building Analytic Apps for SaaS: “Analytics as a Service”
 
Cosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics WorkshopCosmos DB Real-time Advanced Analytics Workshop
Cosmos DB Real-time Advanced Analytics Workshop
 
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
2017 AWS DB Day | Amazon Redshift 자세히 살펴보기
 
The Last Frontier- Virtualization, Hybrid Management and the Cloud
The Last Frontier-  Virtualization, Hybrid Management and the CloudThe Last Frontier-  Virtualization, Hybrid Management and the Cloud
The Last Frontier- Virtualization, Hybrid Management and the Cloud
 
HDF Data in the Cloud
HDF Data in the CloudHDF Data in the Cloud
HDF Data in the Cloud
 
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce PlatformMongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
MongoDB World 2018: Breaking the Mold - Redesigning Dell's E-Commerce Platform
 
AWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon RedshiftAWS June Webinar Series - Getting Started: Amazon Redshift
AWS June Webinar Series - Getting Started: Amazon Redshift
 

Último

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfMarharyta Nedzelska
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanyChristoph Pohl
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesPhilip Schwarz
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfLivetecs LLC
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)jennyeacort
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfFerryKemperman
 

Último (20)

A healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdfA healthy diet for your Java application Devoxx France.pdf
A healthy diet for your Java application Devoxx France.pdf
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte GermanySuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
SuccessFactors 1H 2024 Release - Sneak-Peek by Deloitte Germany
 
Folding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a seriesFolding Cheat Sheet #4 - fourth in a series
Folding Cheat Sheet #4 - fourth in a series
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Advantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your BusinessAdvantages of Odoo ERP 17 for Your Business
Advantages of Odoo ERP 17 for Your Business
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
How to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdfHow to Track Employee Performance A Comprehensive Guide.pdf
How to Track Employee Performance A Comprehensive Guide.pdf
 
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
Call Us🔝>༒+91-9711147426⇛Call In girls karol bagh (Delhi)
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Introduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdfIntroduction Computer Science - Software Design.pdf
Introduction Computer Science - Software Design.pdf
 

AquaQ Analytics Kx Event - Data Direct Networks Presentation

  • 1. Getting the most out of multi-year and multi-source trading history Glenn Wright, EMEA Systems Architect DDN June 2014
  • 2. © 2013 DataDirect Networks, Inc. ddn.com Agenda Uh? Who is DDN? The Evolution of Data in Data Handling Market Systems The Big Analytics Crunch What’s hot, what not…. It’s Parallel Performance, stupid!
  • 3. © 2013 DataDirect Networks, Inc. ddn.com DDN | The “Big” In Big Data 800% Paypal accelerates stream processing and fraud analytics by 8x with DDN, saves $100Ms. 1TB/s The world’s fastest file system, to power the US’s fastest supercomputer, is powered by DDN. Tier 1 Tier1 CDN accelerates the world’s video traffic using DDN technology to exceed customer SLAs. 3
  • 4. © 2013 DataDirect Networks, Inc. ddn.com DDN | The Technology Behind The World’s Leading Data-Driven Organizations HPC & Big Data Analysis Cloud & Web Infrastructure Professional Media Security
  • 5. © 2013 DataDirect Networks, Inc. ddn.com Big Data & Cloud Infrastructure DDN’s Award-Winning Product Portfolio Analytics Reference Architectures EXAScaler™ 10Ks of Clients 1TB/s+, HSM Linux HPC Clients NFS & CIFS [2014] Petascale Lustre® Storag e Enterprise Scale-Out File Storage GRIDScaler™ ~10K Clients 1TB/s+, HSM Linux/Windows HPC Clients NFS & CIFS SFA™12KX 48GB/s, 1.7M IOPS 1,680 Drives in 2 Racks Optional Embedded Computing SFA7700 12.5GB/s, 450K IOPS 60 Drives in 4U 228 Drives in 12U Storage Fusion Architecture™ Core Storage Platforms SATA SSD Flexible Drive Configuration SAS SFX™ Automated Flash Caching WOS® 3.0 32 Trillion Unique Objects Geo-Replicated Cloud Storage 256 Million Objects/Second Self-Healing Cloud Parallel Boolean Search Cloud Foundation Big Data Platform Management DirectMon ™ Cloud Tiering Infinite Memory Engine™ [Tech Preview] Distributed File System Buffer Cache WOS7000 60 Drives in 4U Self-Contained Servers Adaptive Transparent Flash Cache SFX API Gives Users Control [pre-staging, alignment, by-pass]
  • 6. © 2013 DataDirect Networks, Inc. ddn.com 0.0 20,000,000.0 40,000,000.0 60,000,000.0 80,000,000.0 100,000,000.0 120,000,000.0 140,000,000.0 160,000,000.0 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 TOTAL Americas Asia - Pacific Evolution of Market Systems SOURCE: World Federation of Exchanges 2011 Annual Report and Statistics DASD F DASD Scale-out NAS Parallel File System
  • 7. © 2013 DataDirect Networks, Inc. ddn.com UNDERLYING ISSUE: Gaping Performance Bottlenecks • Moore’s Law has out-stripped improvements to disk drive technology by two orders of magnitude during the last decade • Analytics moved to HPC clusters • Today’s servers are hopelessly unbalanced between the CPUs need for data and the HDDs ability to keep up HDD vs. CPU Relative Performance Improvement 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 20,000 x 1gb 16gb
  • 8. © 2013 DataDirect Networks, Inc. ddn.com Welcome to the Big Analytics Crunch • 500TB to > 2PB of historical data for one TZ • Distributed cache : online model reads data at 100s of GB/s IO (Tick DB application such as kdb+) • 3D “cube” of in memory distributed data, online, realtime • 100’s of services/servers working together in memory: low latency analytics w/ simplicity of persistent File system semantics • Burst buffer low latency operation mainstream in FSI ► Real time Back testing ► Real time intra-day risk positioning
  • 9. © 2013 DataDirect Networks, Inc. ddn.com Why DDN & Why Parallel ? In Production Many systems deployed W/W @ Global Investment Banks and Hedge Funds Performance and Consolidation Back test in a few seconds is much closer to the trade event Mix online history and real time trade analytics Consolidate in-memory databases against on copy of data At Scale Flash – is NOT scale @ capacity Single namespace, history and real-time
  • 10. © 2013 DataDirect Networks, Inc. ddn.com Limitless Scale up and Scale out with kdb+… Compute Fabric KDB+ (1) KDB+ (2) KDB+ (3) KDB+ (16) MDS Primary MDS Replica OSS1 MDT DDN SFA7700 DDN SFA7700 OSS2 OSS3 OSS4
  • 11. © 2013 DataDirect Networks, Inc. ddn.com What we changed: export SLAVECOUNT=160 # number of kdb+ client tasks Export CLIENTCOUNT=10 # number of processes per kdb server Q script Query: l beforeeach.q R1S:rrdextras flip`k`v!(" S*";",")0:`:rrd.csv / year-hibid outp t:”YRHIBID"; fn:{[f;s;d] flip`date`sym`a!flip raze(f each s)peach d}; NRS:.tasks.rxsg[H;`$t;1;(fn[hb];apickAs[R1S;`Symbol];reverse ALLDATES2011)]; l aftereach.q symbols: $glenn head rrd.csv 1,Symbol,LKQQ 1,Symbol,LHDE 1,Symbol,LNJO 1,Symbol,LLTR 1,Symbol,LRFC 1,Symbol,LQGA 1,Symbol,LTNQ 1,Symbol,LSAG 1,Symbol,LQIA 1,Symbol,LKSJ … x850 symbols vs 84
  • 12. © 2013 DataDirect Networks, Inc. ddn.com glenn$ more hostport.txt 127.0.0.1:5000 127.0.0.1:5001 127.0.0.1:5002 127.0.0.1:5003 127.0.0.1:5004 127.0.0.1:5005 127.0.0.1:5006 127.0.0.1:5007 127.0.0.1:5008 127.0.0.1:5009 What we changed (2): # replace $QEXEC initdb.k -g 1 -p $((baseport+i)) </dev/null &>log$((baseport+i)).log& for i in `seq 20000 20009` do for j in `seq 0 15` do echo ssh server-$j "cd $HOME;QHOME=/home/glenn/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &" ssh gp-2-$j "cd $HOME;QHOME=/home/mpiuser/q $HOME/l64/q initdb.k -p $i -g 1 </dev/null &> $i-$j.log &" while ! nc -z "gp-2-$j" $i; do sleep 0.1; done done done # get ready ?? echo `date -u` $SLAVECOUNT slave tasks started # then start the servers aimed at the slaves baseport=5000 for ((i=0; i<$CLIENTCOUNT; i++)); do $QEXEC initdb.k -g 1 -s -$SLAVECOUNT -p $((baseport+i)) </dev/null &>log$((baseport+i)).log& while ! nc -z localhost $((baseport+i)); do sleep 0.1; done Done # check that everything can startup : $QEXEC startdb.q -s -$SLAVECOUNT -q
  • 13. © 2013 DataDirect Networks, Inc. ddn.com What we changed (3): Startdb.q … … / check all servers are there /{hopen(x;500)}each("I"$getenv`BASEPORT)+til"I"$getenv`SLAVECOUNT; {hopen(x;2500)}each hsym`$read0`:slavehostport.txt; l initdb.k {hopen(x;500)}each 5000+til"I"$getenv`CLIENTCOUNT; Cat slavehostport.txt: 192.168.3.51:20000 192.168.3.51:20001 192.168.3.51:20002 192.168.3.51:20003 192.168.3.51:20004 192.168.3.51:20005 192.168.3.51:20006 192.168.3.51:20007 192.168.3.51:20008 192.168.3.51:20009 192.168.3.52:20000 192.168.3.52:20001 192.168.3.52:20002 …. 160 times
  • 14. © 2013 DataDirect Networks, Inc. ddn.com Slave (1) slave (2) Slave (3) Slave n Lustre/DDN Service /mnt/onefilesystem Q clients: Slave x10 Slave x10 Slave x10 Slave x10 Up to 1TB/sec… “n” way server striping or by date/sym
  • 15. © 2013 DataDirect Networks, Inc. ddn.com Results of Scaling the service …. 0 50 100 150 200 250 Single Thread Lustre Latency reduction (number of seconds for query) *Lower is better The Parallel FS solution shows a near linear scalability model for one instance running over many nodes, as measured from kdb+. Latency is the time to wait from the kdb+ query of 245GB of data. To put this in context, these nodes were only equipped with 64GB of memory.
  • 16. © 2013 DataDirect Networks, Inc. ddn.com Some of the many Benefits of kdb+ on Parallel FS 1. Significant decrease in operational latency per kdb+ query, especially when running queries that search through significant amounts of historical market information. Achieved by balancing content around multiple file system servers 2. Parallelization of kdb+ query “threads” in a single shared namespace, allowing a user to treat any data workload independently from other data workloads. “query from hell” on production system is now OK?” 3. Simultaneous read/write operations on a single namespace for the entre database and for any number of kdb+ clients, (e.g. end of day data consolidations into a hdb instance) 4. Sharing of data amongst different independent hdb/rdb instances. Many instances of kdb can view the same data, meaning that strategies for data sharing and private data segments may be consolidated onto the same space. Avoids the need for kdb+ admins to physically copy data around the network or disks 5. Kdb+ context can be “striped” around all FS servers, or can be allocated in a round robin fashion against each server. Striping allows the opportunity for some files to attain maximal I/O rates for a single kdb+ “object”.
  • 17. © 2013 DataDirect Networks, Inc. ddn.com Next Steps?
  • 18. Thanks! gwright@ddn.com www.ddn.com /Big Data Glenn Wright, EMEA Systems Architect DDNJune 2014

Notas do Editor

  1. Don’t create data copies in local flash/NVRAM Higher Cost: Capacity, Admin, Power, Space, Software Licensing No Share, Higher Data Risk, long time to consolidate and checkpoint Consolidate to a single system that delivers Linear scaling performance Single Point of Admin Higher Density