In laymen's term, this is a file system that realizes hot and cold data identification, moving cold data to secondary storage (dropbox here), retrieving cold data from secondary storage as an essential activity.
In this project we implemented this file system and handled all the general and specific cases to allow seamless transfer of data from hot to cold and cold to hot.
1. HOT COLD
Unified Virtual File System
For Hot & Cold Data Storage
Aditya Ambre Madhura S. Raghavan Rohit Arora
ENTERPRISE STORAGE ARCHITECTURE
GROUP 2
2. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
AGENDA
➔ Problem Statement
➔ Project Goals and Features
➔ Architecture and Workflow
➔ Verification Cases
➔ Summary
3. Least
Frequently
Accessed
Data
HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
PROBLEM STATEMENT
➔ Lifecycle of Data.
◆ Access frequency.
◆ Storage capacity and hardware characteristics.
➔ User intervention - Running jobs/scripts.
➔ Acknowledging Data temperature
➔ Tight coupling needed between storage components
Frequently
Accessed
Data
4. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
WHAT IS A HOT FILE?
Data File that
➔ Very frequently accessed.
➔ Mostly contains business critical information.
➔ Needs to be accessed quickly.
5. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
WHAT IS A COLD FILE?
Data File that
➔ Is infrequently accessed.
➔ Contains less important information.
➔ Need not be quickly accessed.
6. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
GOAL: WHAT OUR PROJECT IS?
➔ From decoupled storage components - To - tightly coupled two-
tiered storage system
➔ Manage hot & cold data between primary and secondary storage.
➔ Manage primary storage space utilization.
➔ File transfer do not interrupt FS operations.
➔ User agnostic about file transfer and storage.
➔ Optimal storage of cold data.
7. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
WHAT OUR PROJECT IS?
8. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
FEATURES
➔ Infinite Storage illusion
➔ Automatic cold data identification and transfer
➔ Consistent CRUD operations for both hot and cold files
➔ Block level storage
➔ On the fly deduplication
➔ Uninterrupted file access
➔ File level Consistency
➔ Optimal storage space utilization
9. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
OUR ARCHITECTURE
Cold File
Tracking
Hot File
Tracking
File Tracking
Layer
Data Block
Processing Layer
Write block
to cold
Get block
from cold
De-duplication
COLD
STORAGE
APPLICATION
Write Read
FUSE OPERATIONS
Read, Write, Delete, Rename, etc.
2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
4a8f9ec938243beac4b2d…
Hot File
Cold File
10. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
HOT-TO-COLD WORKFLOW
COLD
STORAGE
APPLICATION
Write
FUSE {WRITE} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer
13. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
HOT-TO-COLD WORKFLOW
File Tracking
Layer
1. List all the files
2. Sort files by access time - oldest to newest
3. Select files to be transferred - (till <=50%)
4. Sort above files by size - large to small
5. Send the largest & least accessed files to
Data Processing layer
Cold File tracking
22. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
COLD
STORAGE
APPLICATION
FUSE {READ} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer
Read
Request
Check: Is File on Hot Storage?
2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
4a8f9ec938243beac4b2d…
23. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
COLD
STORAGE
APPLICATION
FUSE {READ} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer
Read
Request
Check: Is File on Hot Storage?
Get block
from cold
No 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
4a8f9ec938243beac4b2d…
24. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
Data Block
Processing Layer
1. Request copy of Hashtable
2. Get Hashtable
Get Block
from Cold
COLD
STORAGE
1. Request Hashtable
2. Gets Hashtable
25. 2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
Data Block
Processing Layer
1. Request copy of Hashtable
2. Get Hashtable
3. Read block presence on cold
Get Block
from Cold
COLD
STORAGE
3. Is block
present?
26. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
Data Block
Processing Layer
1. Request copy of Hashtable
2. Get Hashtable
3. Read block presence on cold
4. Request/Get block from cold
Get Block
from Cold
COLD
STORAGE
4 Request Block
4. Gets Block
2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
Block 1 Block 2 Block 3
27. 2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
Data Block
Processing Layer
1. Request copy of Hashtable
2. Get Hashtable
3. Read block presence on cold
4. Request/Get block from cold
5. Write transferred’ block
content to memory block
6. Construct complete file
Get Block
from Cold
COLD
STORAGE
Block 1
Block 2
Block 3
6.
28. 2f0f3ff2…
7439635…
e7faa85…
3f35ec5f…
e4ae0b9...
HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
Data Block
Processing Layer
1. Request copy of Hashtable
2. Get Hashtable
3. Read block presence on cold
4. Request/Get block from cold
5. Write transferred’ block
content to memory block
6. Construct complete file
7. Delete copy of Hashtable
Get Block
from Cold
COLD
STORAGE
Block 1
Block 2
Block 3
7. Delete
Hashtable
29. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
COLD-TO-HOT WORKFLOW
COLD
STORAGE
APPLICATION
FUSE {READ} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer
ReadRead
Request
Get block
from cold
Block Read
Request
No 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
4a8f9ec938243beac4b2d…
30. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
MINIMAL THRESHOLD WORKFLOW
COLD
STORAGE
APPLICATION
FUSE {READ} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer
Some
Operation
Get block
from cold
Block Read
Request
Yes 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
4a8f9ec938243beac4b2d…
Check: Storage <= 30%
Get Cold FileHot File
Tracking
31. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
READ OPERATION WORKFLOW
COLD
STORAGE
APPLICATION
FUSE {READ} OPERATIONS
File Tracking
Layer
Data Block
Processing Layer
Some
Operation
Get block
from cold
Block Read
Request
Yes 2f0f3ff2c7439635e7faa85…
3f35ec5fe4ae0b963779c8…
4a8f9ec938243beac4b2d…
Check: Storage >30% & < 70%
Get Cold FileHot File
Tracking
32. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
QUICK DEMO
33. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
SCENARIOS / VERIFICATION CASES
I. GENERAL
➔ File System 70% full -> Transfer to cold storage.
➔ File System drops less than 30% -> Transfer from cold storage.
➔ File transfers -> Do not interrupt general FS operations.
➔ Redundant/Duplicate blocks ->Not transferred.
34. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
SCENARIOS / VERIFICATION CASES
II. SPECIFIC
➔ Files transferred –> Based on access and size.
➔ File removed on hot storage –> After last block is transferred.
➔ File in transition accessed –> Abort transfer, access granted!
➔ File space reclamation and File access –> Synchronized.
➔ Only one background process running at specific time.
➔ Delayed delete (rm) -> Transparent to user.
35. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
ASSUMPTIONS
➔ Network is always available.
➔ Hot-Cold classification at file level
➔ Cold Storage is infinite.
➔ Files are not very small or very large.
➔ Delay is accepted for rarely accessed files.
➔ File access granularity – in seconds.
36. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
SUMMARY
➔ Acknowledged data temperatures - hot and cold
➔ Project Features
◆ Auto file identification.
◆ File transfer
◆ Deduplication
➔ Architecture and workflows in action.
➔ Design and implementation of file tracking layer
➔ Design and implementation of Block Data Process Layer
➔ Design decisions for specific verification scenarios.
37. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
FUTURE SCOPE
➔ Variable block size and Block size specifications.
➔ Garbage collection on secondary/cold storage.
➔ Cold file identification parameters and profiles.
➔ Distributed cold storage.
38. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
REFERENCES
1. S. Quinlan and S. Dorward, “Venti: A new approach to archival storage,” in
Proceedings of the First USENIX Conference on File and Storage
Technologies (FAST), 2002. http://plan9.bell-labs.com/sys/doc/venti/venti.
pdf
2. Chuanyi Liu, Dapeng Ju, et al, “Semantic data de-duplication for archival
storage systems,” in Proceedings of the 13th IEEE Asia-Pacific Computer
Systems Architecture Conference (ACSAC 2008), Hsinchu, Taiwan, August,
2008.
3. Sean Quinlan, Jim McKie Russ Cox, “Fossil, an Archival File Server”, Lucent
Technologies Bell Labs, Unpublished memorandum (September 2003).
4. http://www.storiant.com/resources/Cold-Storage-Is-Hot-Again.pdf
5. “What is Unified Storage system ” http://searchstorage.techtarget.
com/definition/unified-storage
6. File System in User Space - http://fuse.sourceforge.net/
39. HOT COLD
CSC 568 Enterprise Storage Architecture (NC State University)
QUESTIONS ?