SlideShare uma empresa Scribd logo
1 de 33
www.twosigma.com
Smooth Storage
September 13, 2018Proprietary and Confidential – Not for Redistribution
A storage system for managing structured time
series data at Two Sigma
Saurabh Goel
saurabh.goel@twosigma.com
Disclaimer
This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer
to buy any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon
for, investment advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates
(collectively, “Two Sigma”). Such views reflect the assumptions of the author(s) of the document and are subject to change without
notice. The document may employ data derived from third-party sources. No representation is made by Two Sigma as to the accuracy of
such information and the use of such information in no way implies an endorsement of the source of such information or its validity.
The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two
Sigma. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for
identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark
does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.
Outline
September 13, 2018
• Motivation and design emphasis
• Data Model and API
• Implementation of the data model
• System Architecture
• Looking Forward
Proprietary and Confidential – Not for Redistribution
Motivation
September 13, 2018
• Why have specialized storage for time series data ?
 Extremely common at Two Sigma
 Time is one of the primary dimensions along which applications want to partition and
filter data
 Scale – in terms of both size and access
 Optimizing for the target application workload and requirements
Proprietary and Confidential – Not for Redistribution
Smooth’s design emphasis
September 13, 2018
• Optimized for range queries and range updates executed in parallel per table
• File system like operations but with database like properties like atomicity
and an isolation model for concurrent access
• Centrally managed service at TS
• Higher expectations around reliability, availability, and multi-tenancy
(security, access control, fair sharing of resources, etc)
• Storage efficiency is also a major concern given the overall size of data stored
Proprietary and Confidential – Not for Redistribution
File system ------------------------------ Smooth --------------- Database
Target Application characteristics
September 13, 2018
• Parallel time partitioned jobs that move a lot of data
• Tend to be batch oriented; care more about throughput than latency
• New use cases are demanding better latency, smaller IO, more query power
• Not good for workloads that require very low latencies or issue large numbers
of small reads and writes
Proprietary and Confidential – Not for Redistribution
Outline
September 13, 2018
• Motivation and design emphasis
• Data Model and API
• Implementation of the data model
• System Architecture
• Looking Forward
Proprietary and Confidential – Not for Redistribution
Data Model
September 13, 2018Proprietary and Confidential – Not for Redistribution
• Tables with schema; mandatory time column
• Rows ordered and indexed by time
• Not relational – duplicate timestamps/rows allowed; no notion of primary key
but users can enforce PK constraints in their applications
• Easy to update schema
• Can store wide sparse schemas efficiently
Write API
September 13, 2018
Updates a given time range atomically; the existing rows belonging to the range
are replaced by the given set of new rows
Proprietary and Confidential – Not for Redistribution
WriteSession s = write(table, [10, 42));
s.addRow(<10, ..>);
s.addRow(<15, ..>);
// repeated timestamp is ok
s.addRow(<15, ..>);
// rows must be added in non-decreasing order
s.addRow(<10, ..>);
// rows must lie within the given time range
s.addRow(<50, ..>);
s.commit();
Write API
September 13, 2018Proprietary and Confidential – Not for Redistribution
• Set of write operations to a table forms a total order; internally each write
gets a unique, strictly monotonically increasing logical commit timestamp
• Distributed atomic writes are possible
• Delete is just a special case of update where no new rows are written
Read API
September 13, 2018Proprietary and Confidential – Not for Redistribution
• Rows returned are based on the latest committed view of the table at the
start of the read operation. Remains isolated from concurrent writes.
Read API
• Snapshot reads over a given time range
Iterator<Row> i = read(table, time range);
while(i.hasNext()) {
doSomething(i.next());
}
Other Operations
September 13, 2018
• Some operations that are not officially supported but a natural fit for smooth
• Distributed snapshot reads
• Reads in the past, permanent snapshots
• Atomic read-modify-write operations using optimistic concurrency control
(OCC) on the commit time
Proprietary and Confidential – Not for Redistribution
Outline
September 13, 2018
• Motivation and design emphasis
• Data Model and API
• Implementation of the data model
• System Architecture
• Looking Forward
Proprietary and Confidential – Not for Redistribution
Table Implementation
September 13, 2018Proprietary and Confidential – Not for Redistribution
Time column
Shard 2
Shard 1
overwritten time range
Committime
c1
c2
Data
file
Replica
Data file contains the new
set of ordered rows;
immutable and indexed;
potentially replicated
Shard is the internal representation
of an update operation;
semantically immutable
Data layer
Metadata layer
Read Algorithm
September 13, 2018Proprietary and Confidential – Not for Redistribution
Time column
Committime
Shard 1
Shard 2
Shard 3
Shard 4
Read this range
start of
read
Reads are implemented by
concatenating together visible
subranges of overlapping shards - we
call this the “read plan”
The underlying data file per shard is
ordered and indexed and can efficiently
select rows belonging to visible sub-
ranges
Data File format
September 13, 2018
The underlying data file is indexed using a simple two level static B+Tree
Proprietary and Confidential – Not for Redistribution
Data File format
September 13, 2018
A data file has one index block and individually compressed data blocks laid out
contiguously
• Data block is the unit of read; variable sized and compressed; typically small
number of MBs; allow random access and parallelization
• Currently use lz4 for most of the files; very low overhead but still gives us
about 2x compression on average; have used gzip for some of the cold data
files
Proprietary and Confidential – Not for Redistribution
Compaction
September 13, 2018
Problem: overwrites of random time ranges and small writes
• Excessive fragmentation of the read plan; leads to slow reads, and excessive
seeks on the backend data stores reducing overall serving capacity
• Metadata bloat; small shards/files means larger metadata on smooth and
object stores
• Garbage; data under hidden ranges can be garbage collected
Proprietary and Confidential – Not for Redistribution
Compaction Process
September 13, 2018Proprietary and Confidential – Not for Redistribution
Time column
Committime
Shard 1
Shard 2
Shard 3
Shard 4
New compacted shard
committed here
New compacted
shard
Deleted after the new
shard is committed
Underlying data files
are not immediately
deleted to support
ongoing reads
Only contiguous fragments can be combined
together!
Comparing with LSM
September 13, 2018
Similar to Log Structured Merge (LSM) tree
• Smooth impl is log structured
• immutable shards with embedded B-trees are similar to “sstables”
• both have compaction processes aimed at similar objectives
• Differ in details – each shard carries with itself a “bulk delete” tombstone
whose handling is deferred till compaction time
• read algorithm is different – no row level comparison for “next” operation
• Key-value stores can use similar ideas to optimize bulk deletes
Proprietary and Confidential – Not for Redistribution
Write Amplification
September 13, 2018
• Write amplification = actual bytes written to storage / bytes written by user
• Has not been an issue in practice – less than 10 on average
• If the write workload gets more challenging (i.e. higher rate of small random
writes)
• Use leveled compaction similar to traditional key-value based LSM storage
engines
• by allowing non-contiguous shards to be combined – shards essentially get moved
into data files
• would make our read algorithm more complex - need to merge read plans from all
levels
Proprietary and Confidential – Not for Redistribution
Outline
September 13, 2018
• Motivation and design emphasis
• Data Model and API
• Implementation of the data model
• System Architecture
• Looking Forward
Proprietary and Confidential – Not for Redistribution
System Architecture
September 13, 2018Proprietary and Confidential – Not for Redistribution
System Architecture
September 13, 2018
• All smooth metadata is stored on Microsoft Sql Server which gets replicated
to backup servers in a remote data center
• Stateless metadata servers front the database providing functions like
authorization, quota enforcement, and qos (fair sharing of resources)
• Applications link with a smooth client library in order to access smooth
Proprietary and Confidential – Not for Redistribution
System Architecture
September 13, 2018
• Data files are stored in object stores
• Multiple different types of OSs can be plugged into smooth and federated
together for scaling, or replicated across for geo-redundancy/availability, or
used for storage tiering.
• Currently we use HDFS for warm data and CELFS for cold data; CELFS is an
internal archival file system at TS
Proprietary and Confidential – Not for Redistribution
Virtues of Immutability
September 13, 2018
• A design principle we have been using is immutability - both physical (write-
once data files) and semantic (shards)
• The combination of linear metadata (i.e. strictly increasing commit
timestamps) and immutable elements means that user reads and updates, the
shard compaction process, and physical data movement process can operate
in parallel with no interference and with minimal coordination
• Data files can be cached without worrying about consistency
This simple model has been central to keeping the system simple, robust and
scalable.
Proprietary and Confidential – Not for Redistribution
Some Statistics
September 13, 2018
• Multiple PBs of unique compressed data
• Read peaks in excess of 100 GB/s (before decompressing)
• 100s of millions of files/shards
• 10s of millions of tables
• 10s of thousands of concurrent requests
Proprietary and Confidential – Not for Redistribution
Outline
September 13, 2018
• Motivation and design emphasis
• Data Model and API
• Implementation of the data model
• System Architecture
• Looking Forward
Proprietary and Confidential – Not for Redistribution
Looking Forward
September 13, 2018
• Multi-datacenter and public cloud read scaling
• CDN like distributed caching layer that spans even to sites that don’t store
data
• Encryption at rest may be important for cloud use cases
• More cost-efficient multi-dc replication and cold data storage
• Data stores that use erasure coding
• More efficient data encoding and compression
• Data stores that can replicate data across data centers and support
desirable failover semantics
Proprietary and Confidential – Not for Redistribution
Looking Forward
September 13, 2018
• Performance
• Performance consistency is a major concern - tail latencies are a major issue
with HDFS
• Issues with slow serialization and parsing of rows
• More challenging workloads
• Interactive workloads are becoming common – latency sensitive
• Column filtering
• Complex read queries
Proprietary and Confidential – Not for Redistribution
Looking Forward
September 13, 2018
Complex queries
• Common for time series datasets to have multiple sub-series merged together
by time, like prices per stock ticker. The sub-series is typically identified by
another column. The cardinality of this column is generally in 10k to 20k
range
• Example query: given an arbitrary subset of tickers and a time range, return all
matching rows ordered by time
• In reality each ticker has its own time range, and there are several variations
of this query
• Looking at new kinds of indexing
Proprietary and Confidential – Not for Redistribution
Looking Forward
September 13, 2018
• Moving away from a “thick” smooth client
• Enables quick iteration and bug fixes
• Multi-language support
• Opens up many architectural possibilities like caching, easier access control,
Qos, etc
• Various other reliability, multi-tenancy, metadata scaling, security and
operability improvements
Proprietary and Confidential – Not for Redistribution
September 13, 2018
Thank You!
Proprietary and Confidential – Not for Redistribution

Mais conteúdo relacionado

Mais procurados

美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化confluent
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...
Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...
Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...bitnineglobal
 
Application Archaeology: Accelerating App Modernization at DICK’S Sporting Goods
Application Archaeology: Accelerating App Modernization at DICK’S Sporting GoodsApplication Archaeology: Accelerating App Modernization at DICK’S Sporting Goods
Application Archaeology: Accelerating App Modernization at DICK’S Sporting GoodsVMware Tanzu
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftAmazon Web Services
 
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...In Marketing We Trust
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseDatabricks
 
My First 90 days with Vitess
My First 90 days with VitessMy First 90 days with Vitess
My First 90 days with VitessMorgan Tocker
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...confluent
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessEnterprise Knowledge
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaJiangjie Qin
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewDmitry Tolpeko
 
1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manual
1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manual1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manual
1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manualjkejdkm
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
 

Mais procurados (20)

美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化美团数据平台之Kafka应用实践和优化
美团数据平台之Kafka应用实践和优化
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Construindo um data lake na nuvem aws
Construindo um data lake na nuvem awsConstruindo um data lake na nuvem aws
Construindo um data lake na nuvem aws
 
Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...
Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...
Graph Database Meetup in Korea #6. Graph Database 5 Offerings_ AssetManager (...
 
Application Archaeology: Accelerating App Modernization at DICK’S Sporting Goods
Application Archaeology: Accelerating App Modernization at DICK’S Sporting GoodsApplication Archaeology: Accelerating App Modernization at DICK’S Sporting Goods
Application Archaeology: Accelerating App Modernization at DICK’S Sporting Goods
 
Introduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF LoftIntroduction to AWS Glue: Data Analytics Week at the SF Loft
Introduction to AWS Glue: Data Analytics Week at the SF Loft
 
A Closer Look at RabbitMQ
A Closer Look at RabbitMQA Closer Look at RabbitMQ
A Closer Look at RabbitMQ
 
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
How to Do Anything You Want in Google Data Studio - Google Marketing Platform...
 
Free Training: How to Build a Lakehouse
Free Training: How to Build a LakehouseFree Training: How to Build a Lakehouse
Free Training: How to Build a Lakehouse
 
My First 90 days with Vitess
My First 90 days with VitessMy First 90 days with Vitess
My First 90 days with Vitess
 
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
Kafka Cluster Federation at Uber (Yupeng Fui & Xiaoman Dong, Uber) Kafka Summ...
 
data warehouse vs data lake
data warehouse vs data lakedata warehouse vs data lake
data warehouse vs data lake
 
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are PricelessKnowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
Knowledge Graphs are Worthless, Knowledge Graph Use Cases are Priceless
 
Handle Large Messages In Apache Kafka
Handle Large Messages In Apache KafkaHandle Large Messages In Apache Kafka
Handle Large Messages In Apache Kafka
 
Apache Kafka - Messaging System Overview
Apache Kafka - Messaging System OverviewApache Kafka - Messaging System Overview
Apache Kafka - Messaging System Overview
 
Pub/Sub Messaging
Pub/Sub MessagingPub/Sub Messaging
Pub/Sub Messaging
 
1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manual
1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manual1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manual
1975 JOHNSON EVINRUDE OUTBOARD 6.0 Hp Service Repair Manual
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
Kafka for DBAs
Kafka for DBAsKafka for DBAs
Kafka for DBAs
 

Semelhante a Smooth Storage - A distributed storage system for managing structured time series data at Two Sigma

Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18Cloudera, Inc.
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWSAmazon Web Services
 
Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases AWS Germany
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Saurabh K. Gupta
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native PlatformSunil Govindan
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauSam Palani
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeDATAVERSITY
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architectureRahul Chaturvedi
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InSnapLogic
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...DataStax Academy
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSAWS User Group Kochi
 

Semelhante a Smooth Storage - A distributed storage system for managing structured time series data at Two Sigma (20)

Big data journey to the cloud rohit pujari 5.30.18
Big data journey to the cloud   rohit pujari 5.30.18Big data journey to the cloud   rohit pujari 5.30.18
Big data journey to the cloud rohit pujari 5.30.18
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
Serverless Datalake Day with AWS
Serverless Datalake Day with AWSServerless Datalake Day with AWS
Serverless Datalake Day with AWS
 
Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases Choosing the Right Database for My Workload: Purpose-Built Databases
Choosing the Right Database for My Workload: Purpose-Built Databases
 
Amazon Aurora
Amazon AuroraAmazon Aurora
Amazon Aurora
 
Big Data@Scale
 Big Data@Scale Big Data@Scale
Big Data@Scale
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration Achieve data democracy in data lake with data integration
Achieve data democracy in data lake with data integration
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data on Cloud Native Platform
Big Data on Cloud Native PlatformBig Data on Cloud Native Platform
Big Data on Cloud Native Platform
 
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & TableauBig Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
Big Data Analytics on the Cloud Oracle Applications AWS Redshift & Tableau
 
Using Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-PurposeUsing Data Platforms That Are Fit-For-Purpose
Using Data Platforms That Are Fit-For-Purpose
 
Key aspects of big data storage and its architecture
Key aspects of big data storage and its architectureKey aspects of big data storage and its architecture
Key aspects of big data storage and its architecture
 
Building the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump InBuilding the Enterprise Data Lake - Important Considerations Before You Jump In
Building the Enterprise Data Lake - Important Considerations Before You Jump In
 
unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Amazon Aurora: Database Week SF
Amazon Aurora: Database Week SFAmazon Aurora: Database Week SF
Amazon Aurora: Database Week SF
 
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
Cassandra Summit 2014: Internet of Complex Things Analytics with Apache Cassa...
 
BI & Analytics
BI & AnalyticsBI & Analytics
BI & Analytics
 
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWSACDKOCHI19 - Next Generation Data Analytics Platform on AWS
ACDKOCHI19 - Next Generation Data Analytics Platform on AWS
 
Loading Data into Redshift
Loading Data into RedshiftLoading Data into Redshift
Loading Data into Redshift
 

Mais de Two Sigma

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School BullyingTwo Sigma
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Two Sigma
 
Future of Pandas - Jeff Reback
Future of Pandas - Jeff RebackFuture of Pandas - Jeff Reback
Future of Pandas - Jeff RebackTwo Sigma
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng LiTwo Sigma
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooTwo Sigma
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonTwo Sigma
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerTwo Sigma
 
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia YeResponsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia YeTwo Sigma
 
Archival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh LenersArchival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh LenersTwo Sigma
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif WalshTwo Sigma
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsTwo Sigma
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeTwo Sigma
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkTwo Sigma
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowTwo Sigma
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...Two Sigma
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Two Sigma
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesTwo Sigma
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeTwo Sigma
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied VolatilityTwo Sigma
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API DesignTwo Sigma
 

Mais de Two Sigma (20)

The State of Open Data on School Bullying
The State of Open Data on School BullyingThe State of Open Data on School Bullying
The State of Open Data on School Bullying
 
Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018Halite @ Google Cloud Next 2018
Halite @ Google Cloud Next 2018
 
Future of Pandas - Jeff Reback
Future of Pandas - Jeff RebackFuture of Pandas - Jeff Reback
Future of Pandas - Jeff Reback
 
BeakerX - Tiezheng Li
BeakerX - Tiezheng LiBeakerX - Tiezheng Li
BeakerX - Tiezheng Li
 
Engineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee JooEngineering with Open Source - Hyonjee Joo
Engineering with Open Source - Hyonjee Joo
 
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel HudsonBringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
Bringing Linux back to the Server BIOS with LinuxBoot - Trammel Hudson
 
Waiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-ScalerWaiter: An Open-Source Distributed Auto-Scaler
Waiter: An Open-Source Distributed Auto-Scaler
 
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia YeResponsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
Responsive and Scalable Real-time Data Analytics for SHPE 2017 - Cecilia Ye
 
Archival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh LenersArchival Storage at Two Sigma - Josh Leners
Archival Storage at Two Sigma - Josh Leners
 
The Language of Compression - Leif Walsh
The Language of Compression - Leif WalshThe Language of Compression - Leif Walsh
The Language of Compression - Leif Walsh
 
Identifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane AdamsIdentifying Emergent Behaviors in Complex Systems - Jane Adams
Identifying Emergent Behaviors in Complex Systems - Jane Adams
 
Algorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + PracticeAlgorithmic Data Science = Theory + Practice
Algorithmic Data Science = Theory + Practice
 
HUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For SparkHUOHUA: A Distributed Time Series Analysis Framework For Spark
HUOHUA: A Distributed Time Series Analysis Framework For Spark
 
Improving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache ArrowImproving Python and Spark Performance and Interoperability with Apache Arrow
Improving Python and Spark Performance and Interoperability with Apache Arrow
 
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
TRIEST: Counting Local and Global Triangles in Fully-Dynamic Streams with Fix...
 
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
Exploring the Urban – Rural Incarceration Divide: Drivers of Local Jail Incar...
 
Graph Summarization with Quality Guarantees
Graph Summarization with Quality GuaranteesGraph Summarization with Quality Guarantees
Graph Summarization with Quality Guarantees
 
Rademacher Averages: Theory and Practice
Rademacher Averages: Theory and PracticeRademacher Averages: Theory and Practice
Rademacher Averages: Theory and Practice
 
Credit-Implied Volatility
Credit-Implied VolatilityCredit-Implied Volatility
Credit-Implied Volatility
 
Principles of REST API Design
Principles of REST API DesignPrinciples of REST API Design
Principles of REST API Design
 

Último

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 

Último (20)

Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 

Smooth Storage - A distributed storage system for managing structured time series data at Two Sigma

  • 1. www.twosigma.com Smooth Storage September 13, 2018Proprietary and Confidential – Not for Redistribution A storage system for managing structured time series data at Two Sigma Saurabh Goel saurabh.goel@twosigma.com
  • 2. Disclaimer This document is being distributed for informational and educational purposes only and is not an offer to sell or the solicitation of an offer to buy any securities or other instruments. The information contained herein is not intended to provide, and should not be relied upon for, investment advice. The views expressed herein are not necessarily the views of Two Sigma Investments, LP or any of its affiliates (collectively, “Two Sigma”). Such views reflect the assumptions of the author(s) of the document and are subject to change without notice. The document may employ data derived from third-party sources. No representation is made by Two Sigma as to the accuracy of such information and the use of such information in no way implies an endorsement of the source of such information or its validity. The copyrights and/or trademarks in some of the images, logos or other material used herein may be owned by entities other than Two Sigma. If so, such copyrights and/or trademarks are most likely owned by the entity that created the material and are used purely for identification and comment as fair use under international copyright and/or trademark laws. Use of such image, copyright or trademark does not imply any association with such organization (or endorsement of such organization) by Two Sigma, nor vice versa.
  • 3. Outline September 13, 2018 • Motivation and design emphasis • Data Model and API • Implementation of the data model • System Architecture • Looking Forward Proprietary and Confidential – Not for Redistribution
  • 4. Motivation September 13, 2018 • Why have specialized storage for time series data ?  Extremely common at Two Sigma  Time is one of the primary dimensions along which applications want to partition and filter data  Scale – in terms of both size and access  Optimizing for the target application workload and requirements Proprietary and Confidential – Not for Redistribution
  • 5. Smooth’s design emphasis September 13, 2018 • Optimized for range queries and range updates executed in parallel per table • File system like operations but with database like properties like atomicity and an isolation model for concurrent access • Centrally managed service at TS • Higher expectations around reliability, availability, and multi-tenancy (security, access control, fair sharing of resources, etc) • Storage efficiency is also a major concern given the overall size of data stored Proprietary and Confidential – Not for Redistribution File system ------------------------------ Smooth --------------- Database
  • 6. Target Application characteristics September 13, 2018 • Parallel time partitioned jobs that move a lot of data • Tend to be batch oriented; care more about throughput than latency • New use cases are demanding better latency, smaller IO, more query power • Not good for workloads that require very low latencies or issue large numbers of small reads and writes Proprietary and Confidential – Not for Redistribution
  • 7. Outline September 13, 2018 • Motivation and design emphasis • Data Model and API • Implementation of the data model • System Architecture • Looking Forward Proprietary and Confidential – Not for Redistribution
  • 8. Data Model September 13, 2018Proprietary and Confidential – Not for Redistribution • Tables with schema; mandatory time column • Rows ordered and indexed by time • Not relational – duplicate timestamps/rows allowed; no notion of primary key but users can enforce PK constraints in their applications • Easy to update schema • Can store wide sparse schemas efficiently
  • 9. Write API September 13, 2018 Updates a given time range atomically; the existing rows belonging to the range are replaced by the given set of new rows Proprietary and Confidential – Not for Redistribution WriteSession s = write(table, [10, 42)); s.addRow(<10, ..>); s.addRow(<15, ..>); // repeated timestamp is ok s.addRow(<15, ..>); // rows must be added in non-decreasing order s.addRow(<10, ..>); // rows must lie within the given time range s.addRow(<50, ..>); s.commit();
  • 10. Write API September 13, 2018Proprietary and Confidential – Not for Redistribution • Set of write operations to a table forms a total order; internally each write gets a unique, strictly monotonically increasing logical commit timestamp • Distributed atomic writes are possible • Delete is just a special case of update where no new rows are written
  • 11. Read API September 13, 2018Proprietary and Confidential – Not for Redistribution • Rows returned are based on the latest committed view of the table at the start of the read operation. Remains isolated from concurrent writes. Read API • Snapshot reads over a given time range Iterator<Row> i = read(table, time range); while(i.hasNext()) { doSomething(i.next()); }
  • 12. Other Operations September 13, 2018 • Some operations that are not officially supported but a natural fit for smooth • Distributed snapshot reads • Reads in the past, permanent snapshots • Atomic read-modify-write operations using optimistic concurrency control (OCC) on the commit time Proprietary and Confidential – Not for Redistribution
  • 13. Outline September 13, 2018 • Motivation and design emphasis • Data Model and API • Implementation of the data model • System Architecture • Looking Forward Proprietary and Confidential – Not for Redistribution
  • 14. Table Implementation September 13, 2018Proprietary and Confidential – Not for Redistribution Time column Shard 2 Shard 1 overwritten time range Committime c1 c2 Data file Replica Data file contains the new set of ordered rows; immutable and indexed; potentially replicated Shard is the internal representation of an update operation; semantically immutable Data layer Metadata layer
  • 15. Read Algorithm September 13, 2018Proprietary and Confidential – Not for Redistribution Time column Committime Shard 1 Shard 2 Shard 3 Shard 4 Read this range start of read Reads are implemented by concatenating together visible subranges of overlapping shards - we call this the “read plan” The underlying data file per shard is ordered and indexed and can efficiently select rows belonging to visible sub- ranges
  • 16. Data File format September 13, 2018 The underlying data file is indexed using a simple two level static B+Tree Proprietary and Confidential – Not for Redistribution
  • 17. Data File format September 13, 2018 A data file has one index block and individually compressed data blocks laid out contiguously • Data block is the unit of read; variable sized and compressed; typically small number of MBs; allow random access and parallelization • Currently use lz4 for most of the files; very low overhead but still gives us about 2x compression on average; have used gzip for some of the cold data files Proprietary and Confidential – Not for Redistribution
  • 18. Compaction September 13, 2018 Problem: overwrites of random time ranges and small writes • Excessive fragmentation of the read plan; leads to slow reads, and excessive seeks on the backend data stores reducing overall serving capacity • Metadata bloat; small shards/files means larger metadata on smooth and object stores • Garbage; data under hidden ranges can be garbage collected Proprietary and Confidential – Not for Redistribution
  • 19. Compaction Process September 13, 2018Proprietary and Confidential – Not for Redistribution Time column Committime Shard 1 Shard 2 Shard 3 Shard 4 New compacted shard committed here New compacted shard Deleted after the new shard is committed Underlying data files are not immediately deleted to support ongoing reads Only contiguous fragments can be combined together!
  • 20. Comparing with LSM September 13, 2018 Similar to Log Structured Merge (LSM) tree • Smooth impl is log structured • immutable shards with embedded B-trees are similar to “sstables” • both have compaction processes aimed at similar objectives • Differ in details – each shard carries with itself a “bulk delete” tombstone whose handling is deferred till compaction time • read algorithm is different – no row level comparison for “next” operation • Key-value stores can use similar ideas to optimize bulk deletes Proprietary and Confidential – Not for Redistribution
  • 21. Write Amplification September 13, 2018 • Write amplification = actual bytes written to storage / bytes written by user • Has not been an issue in practice – less than 10 on average • If the write workload gets more challenging (i.e. higher rate of small random writes) • Use leveled compaction similar to traditional key-value based LSM storage engines • by allowing non-contiguous shards to be combined – shards essentially get moved into data files • would make our read algorithm more complex - need to merge read plans from all levels Proprietary and Confidential – Not for Redistribution
  • 22. Outline September 13, 2018 • Motivation and design emphasis • Data Model and API • Implementation of the data model • System Architecture • Looking Forward Proprietary and Confidential – Not for Redistribution
  • 23. System Architecture September 13, 2018Proprietary and Confidential – Not for Redistribution
  • 24. System Architecture September 13, 2018 • All smooth metadata is stored on Microsoft Sql Server which gets replicated to backup servers in a remote data center • Stateless metadata servers front the database providing functions like authorization, quota enforcement, and qos (fair sharing of resources) • Applications link with a smooth client library in order to access smooth Proprietary and Confidential – Not for Redistribution
  • 25. System Architecture September 13, 2018 • Data files are stored in object stores • Multiple different types of OSs can be plugged into smooth and federated together for scaling, or replicated across for geo-redundancy/availability, or used for storage tiering. • Currently we use HDFS for warm data and CELFS for cold data; CELFS is an internal archival file system at TS Proprietary and Confidential – Not for Redistribution
  • 26. Virtues of Immutability September 13, 2018 • A design principle we have been using is immutability - both physical (write- once data files) and semantic (shards) • The combination of linear metadata (i.e. strictly increasing commit timestamps) and immutable elements means that user reads and updates, the shard compaction process, and physical data movement process can operate in parallel with no interference and with minimal coordination • Data files can be cached without worrying about consistency This simple model has been central to keeping the system simple, robust and scalable. Proprietary and Confidential – Not for Redistribution
  • 27. Some Statistics September 13, 2018 • Multiple PBs of unique compressed data • Read peaks in excess of 100 GB/s (before decompressing) • 100s of millions of files/shards • 10s of millions of tables • 10s of thousands of concurrent requests Proprietary and Confidential – Not for Redistribution
  • 28. Outline September 13, 2018 • Motivation and design emphasis • Data Model and API • Implementation of the data model • System Architecture • Looking Forward Proprietary and Confidential – Not for Redistribution
  • 29. Looking Forward September 13, 2018 • Multi-datacenter and public cloud read scaling • CDN like distributed caching layer that spans even to sites that don’t store data • Encryption at rest may be important for cloud use cases • More cost-efficient multi-dc replication and cold data storage • Data stores that use erasure coding • More efficient data encoding and compression • Data stores that can replicate data across data centers and support desirable failover semantics Proprietary and Confidential – Not for Redistribution
  • 30. Looking Forward September 13, 2018 • Performance • Performance consistency is a major concern - tail latencies are a major issue with HDFS • Issues with slow serialization and parsing of rows • More challenging workloads • Interactive workloads are becoming common – latency sensitive • Column filtering • Complex read queries Proprietary and Confidential – Not for Redistribution
  • 31. Looking Forward September 13, 2018 Complex queries • Common for time series datasets to have multiple sub-series merged together by time, like prices per stock ticker. The sub-series is typically identified by another column. The cardinality of this column is generally in 10k to 20k range • Example query: given an arbitrary subset of tickers and a time range, return all matching rows ordered by time • In reality each ticker has its own time range, and there are several variations of this query • Looking at new kinds of indexing Proprietary and Confidential – Not for Redistribution
  • 32. Looking Forward September 13, 2018 • Moving away from a “thick” smooth client • Enables quick iteration and bug fixes • Multi-language support • Opens up many architectural possibilities like caching, easier access control, Qos, etc • Various other reliability, multi-tenancy, metadata scaling, security and operability improvements Proprietary and Confidential – Not for Redistribution
  • 33. September 13, 2018 Thank You! Proprietary and Confidential – Not for Redistribution

Notas do Editor

  1. A shard is semantically immutable, i.e. it always returns the same set of rows The physical representation of the underlying data can change in format or storage location or be replicated
  2. Gets the read plan for the entire time range and finds areas with excessive fragmentation (many small fragments) Selects a contiguous segment of the read plan containing fragments to be fixed, and rewrites them as a single new shard. The commit time of the new shard is the max of participating input shards – this makes sure the compaction process does not interfere with ongoing writes The underlying data files for the deleted shards are not immediately removed so that references from read plans of ongoing reads remain valid