This presentation will cover the challenges researchers face in transferring large data sets. Which tools are currently available? How can data transfers nodes be used as part of a general purpose solution?
Valuable transfer nodes for research - Joseph Hill (UvA) - Netwerkdag 2019
1. User Friendly Data Sharing
with Data Transfer Nodes
Joseph Hill
21 May 2019
Funded by SURFnet
2. How do Researchers Exchange Data
● Ideally users could send directly to each other utilizing all available bandwidth
● Different solutions depending on file size and proximity
● Researchers may be reluctant to learn a new tool
● Alternative solution to hard drives by post needed
● Goal: Provide researchers a means of performing high speed data exchanges
independent of the location of the sender and receiver while minimizing user
facing changes
2
3. Data Transfer Node (DTN)
● A specialized system to conduct high performance data transfers
● Can be used in a variety of work flows
● Many possible configurations depending on the workflow
3
4. Data Transfer Nodes at UvA
● Linux system designed for high I/O
● Tuned for high throughput over long distances
● Transient storage
● 10/40/100 Gbps Network Interface
○ @100 Gbps a 1 TB file can be transferred in 1 minute 30 seconds
○ @1 Gbps a 1 TB file can be transferred in 2 hours 30 minutes
● Fast Storage
○ NVMe Drives or SSD RAID arrays
○ PCIe 3.0 x4 transfer rate of 32 Gbps
● As much compute and memory as necessary
○ Can be built for single thread or multi-threaded performance
4
5. FileSender
● Uses HTML5 for transfers
○ Multiple TCP streams for increased
performance
○ Supports large files
○ End to end encryption
● Only one user needs an account
○ Only requires email address of other user
○ Account holder can be sender or recipient
https://filesender.surf.nl/
5
6. 1. The sender provides the email address of the recipient
2. The sender uploads the file(s) to the FileSender Server
3. The recipient is sent an email with a download link
4. The recipient downloads the file
● Works well with low to
moderate latency
Data Transfers with FileSender
6
7. Data Transfers with FileSender
● High latency negatively impacts performance
○ Primarily due to the configuration on the user’s system
● Occurs whether the FileSender server is close to the sender or receiver
● Particularly an issue with intercontinental data transfers
7
8. DTN Demonstration
● Workstations not optimized for long distance transfers
can benefit by using an optimized path with DTNs
● Scenario shows a file transfer from a workstation in
the US to a workstation in Europe
● Compares a file transfer over an optimized path with
DTNs versus path over the internet
● Indirect path utilizing DTNs achieves substantially
better performance then a file transfer directly between
the workstations
● Just one example of how DTNs can act as an interface
to a path optimized for a specific data transfer
8
9. Data Transfers with FileSender
● Each user has access to a nearby FileSender instance
● Make FileSender aware of peer server
● FileSender systems can be tuned for high throughput despite high latency
9
10. ownCloud
● Data lives within the system
● Support for sharing between users
● Support for federation
10
ownCloud changes done by Antoon Prins of SURFsara
11. DTN Agent
● Provides the user facing application with an interface to a high speed link
● Applications do not need to be the same across domains
● Agent has a module for each application that it supports
● Provides a collection of services
○ Lookup (Peer Discovery)
○ Authentication and Authorization
○ Network Provisioning (optional)
○ Data Transfer
○ Monitoring and Notification
11
13. DTN Agent - Lookup Service
● What do we want the sender to have to know about the recipient
● DNS service records can be used similarly to MX records
● Users may want to set a different destination for themselves
● Service record may point to where to find more specific information
13
_dtn._tcp.uva.nl IN SRV 0 0 3001 dtn.uva.nl.
_dtn._tcp.surf.nl IN SRV 0 0 3001 dtn.surf.nl.
14. DTN Agent - Data Transfer
● Protocol negotiated by agents
○ Type of dataset may affect choice of protocol
● GridFTP, FDT, mdtmFTP, SCP
● Each transfer application implemented as a module
● Need to integrate security mechanisms
14
15. DTN Agent - Monitoring and Notification
● Monitor the status of the data transfer
● Take corrective action in the event of failures
● Provide status information to applications
● Push notifications to application and administrators as needed
15
16. Future Work
● Authentication and Authorization
○ Many security questions remain
○ Current implementation similar to email, authenticate to send, receive everything
○ One time tokens
○ Transfer application lack desired security features
● Network Provisioning
○ Dynamically set up links between DTNs (AutoGOLE)
○ Perform resource reservation
● Determine a protocol of last resort
○ May use custom protocol
16