Data Orchestration Summit
www.alluxio.io/data-orchestration-summit-2019
November 7, 2019
Alluxio 2 Community Update
Speakers:
Calvin Jia, Alluxio
Bin Fan, Alluxio
For more Alluxio events: https://www.alluxio.io/events/
5. Cloud Native on AWS: AMI, CFT, EMR
Presto Hive
Cluster Metadata &
Data cache
Presto Hive
Metadata &
Data cache
Compute-driven
Continuous sync
Compute-driven
Continuous sync
5
§ Alluxio AMI in the Marketplace
§ Alluxio Cloud Formation Template for cluster deployment
§ AWS EMR with Alluxio with bootstrap script
Enable one-click to deploy Alluxio on AWS
6. Cloud Native on Google Cloud: Dataproc
Presto Hive
Metadata &
Data cache
Presto Hive
Metadata &
Data cache
Compute-driven
Continuous sync
Compute-driven
Continuous sync
6
§ Google Dataproc with Alluxio (init action integration available)
Google
Dataproc
Cluster
Enable one-click to deploy Alluxio on Google Cloud
10. § Challenge:
• 1 file metadata takes 1KB of on-heap storage
• 1 billion files would take 1 TB of heap space, GC becomes a big problem
§ Solution:
• Add new tier with embedded RocksDB to store inode tree
• Keep an in-memory cache of frequently used inodes
10
Scaling to 1 Billion+ Files
Scale to one billion files and beyond, with performance comparable
to previous on-heap implementation
11. Scaling to 1 Billion+ Files
11
Available in 2.0.0
Alluxio Master
Local Disk
RocksDB (Embedded)
● Inode Table
● Edge Table
● Block Table
● Block to Worker Table
● Worker to Block Table
On Heap
● Inode Cache
● Mount Table
● Locks
Inode ID Metadata (Binary)
12392 010101101101
12393 110110110100
… …
Edge (ID, name) Inode ID
12392,foo 12393
… …
12. Efficient cluster communication with gRPC
12
Available in 2.0.0
Thrift (Metadata)
Netty (IO)
Alluxio Master
Alluxio Worker
Alluxio Worker
Alluxio Client
Alluxio Master
Alluxio Worker
Alluxio Worker
Alluxio Client
gRPC (Metadata + IO)
14. Replicated Asynchronous Writes
14
RAM / SSD / HDD
Network Speed Write of Data
Application
Alluxio
Client
Alluxio
Worker
RAM / SSD / HDD
Alluxio
Worker
Under Store
Available in 2.0.0
Fast and reliable writes to Alluxio, with data persisted in background
15. Policy Driven Data Management
15
Available in 2.0.0
Alluxio
Master
Alluxio Policy Engine
Example Policy
Move files older than 90
days from HDFS to S3
Application
Apps access the same path regardless
of where the actual data is stored
Decouple logical file system namespace with physical storage systems
17. § New Alluxio Catalog Service
• Provides the Abstraction of Structured Data
• Attaching a Hive MetaStore like Mounting a File system
• Understand and Serve Schema of Files or Objects
§ New Alluxio Data Transformation Service
• Tranform csv à parquet
• Compact many files à fewer files
Deeper Integration with Presto
17
Presto Alluxio Connector Based off the Hive Connector
Now available as Developer Preview
19. 1 3 70
210
750
1080
Fast Growing Developer Community
Started as Haoyuan Li’s PhD project “Tachyon”
v1.0
Feb ‘16
v0.6
Mar ‘15
v0.2
Apr ‘13
v0.1
Dec ‘12
v2.1
Nov ‘19
v1.8
Jul ‘18
Open sourced in under Apache 2.0 License
Contributors
19
20. § Deeper Integration with Presto
• Collaboration w/ Presto maintainers
§ Kubernetes Helm Chart
• Collaboration w/
§ Improved Alluxio POSIX interface and distributed operations
• Contributed by
§ Kubernetes Container Storage Interface (CSI) Implementation
• Contributed by individual contributor Mingfang
Great Community Collaborations
Available in 2.1.0
20
21. Consumer Travel & TransportationTelco & Media
TechnologyFinancial Services Retail & Entertainment Data & Analytics Services
Deployed in Hundreds of Companies
https://www.alluxio.io/powered-by-alluxio/ 21
22. Deployed at Scale in Different Environment
On-Prem
• Huya: 1300 nodes
• Sogou: 1000 nodes
• JD.com: 1000 nodes
• Momo: 850 nodes
• Tencent: 400 nodes
Single Cloud
• Bazaarvoice: AWS
• Ryte: AWS
• Myntra: AWS
• Cuelogic: AWS
• Walmart Labs: GCP
Hybrid Cloud
• DBS Bank
• ING Bank
• Comcast
• Ligadata
• Qiniu Cloud
22
23. Community Activities Around the World
23
New York, March 2019
Seattle, March 2019
Singapore, April 2019
Bay Area, Jun 2019
Beijing, Jun 2019
Austin, Aug 2019
24. Join Our User Community
Join Slack channel
alluxio.io/slack
Wechat Public AccountJoin meetup groups near you
alluxio-open-source-community/
24