Tachyon is an open source project to build a reliable, memory-centric distributed storage system. This is a talk given at Tachyon workshop on July 19, 2015. describing the architecture of this open source project and the growth of its community
1. Bin Fan, Tachyon Nexus
July 19, 2015 @ Tachyon Workshop
tachyon-project.org
A Reliable Memory-Centric
Distributed Storage System
2. • Founded by Tachyon creators and top contributors
• $7.5 million Series A from Andreessen Horowitz
• Committed to Tachyon Open Source
• www.tachyonnexus.com
2
13. An Example: -
• Fast, in-memory data processing framework
– Keep one in-memory copy inside JVM
– Track lineage of operations used to derive data
– Upon failure, use lineage to recompute data
map
filter map
join reduce
Lineage Tracking
13
14. Issue 1
14
Data Sharing is the bottleneck in
analytics pipeline:
Slow writes to disk
Spark Job1
Spark mem
block manager
block 1
block 3
Spark Job2
Spark mem
block manager
block 3
block 1
HDFS / Amazon S3
block 1
block 3
block 2
block 4
storage engine &
execution engine
same process
(slow writes)
15. Issue 1
15
Spark Job
Spark mem
block manager
block 1
block 3
Hadoop MR
Job
YARN
HDFS / Amazon S3
block 1
block 3
block 2
block 4
Data Sharing is the bottleneck in
analytics pipeline:
Slow writes to disk
storage engine &
execution engine
same process
(slow writes)
16. Issue 2
16
Spark Task
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine &
storage engine
same process
Cache loss when process
crashes
17. Issue 2
17
crash
Spark memory
block manager
block 1
block 3
HDFS / Amazon S3
block 1
block 3
block 2
block 4
execution engine &
storage engine
same process
Cache loss when process
crashes
18. HDFS / Amazon S3
Issue 2
18
block 1
block 3
block 2
block 4
execution engine &
storage engine
same process
crash
Cache loss when process
crashes
21. Technical Overview
Ideas
• A memory-centric storage architecture
• Push lineage down to storage layer
• Manage tiered storage
Facts
• One data copy in memory
• Re-computation for fault-tolerance
21
35. Open Source Status
• Started at UC Berkeley AMPLab in Summer 2012
• Apache License 2.0, Version 0.7 (July 2015)
• Deployed at > 50 companies (July 2014)
• 30+ Companies Contributing
• Spark/MapReduce/Flink applications can run
without code change
35
39. Thanks to Our Contributors!Aaron Davidson
Abhiraj Butala
Achal Soni
Albert Chu
Ali Ghodsi
Andrew Ash
Anurag Khandelwal
Aslan Bekirov
Bill Zhao
Bin Fan
Bradley Childs
Calvin Jia
Carson Wang
Chao Chen
Cheng Chang
Cheng Hao
Colin Patrick McCabe
Dan Crankshaw
Darion Yaphet
David Capwell
David Zhu
Dina Leventol
Du Li
Fei Wang
Gene Pang
Gerald Zhang
Grace Huang
Haoyuan Li
Henry Saputra
Hobin Yoon
Huamin Chen
Jacky Li
Jey Kottalam
Jingxin Feng
Joseph Tang
Juan Zhou
Jun Aoki
Kun Xu
Lukasz Jastrzebski
Luogan Kun
Manu Goyal
Mark Hamstra
Mingfei Shi
Mubarak Seyed
Nan Dun
Nick Lanham
Orcun Simsek
Pengfei Xuan
Qianhao Dong
Qifan Pu
Ramaraju Indukuri
Raymond Liu
Rob Vesse
Robert Metzger
Rong Gu
Sean Zhong
Seonghwan Moon
Shaoshan Liu
Shivaram Venkataraman
Shu Peng
Srinivas Parayya
Tao Wang
Thu Kyaw
Timothy St. Clair
Vaishnav Kovvuri
Vikram Sreekanti
Xi Liu
Xiaomeng Huang
Xiaomin Zhang
Xing Lin
Yi Liu
Zhao Zhang
39